For the purpose of this page, we use the terms “rows” and “cases” as equal to refer to the entries of the dataset. In many of the processes made along the deduplication of entries in C1 dataset, we used unstandardized columns or many other data that was in fact duplicated by HASHs, that did not depend on events related to treatment.In order to find and delete duplicated data that does not add information relevant for the purposes of the study, we now may use these standardized variables as a criteria to achieve the goal of having a unique event per HASH, by reducing its complexity based on irrelevant differences.


As stated in the third part of the deduplication process, we identified and defined an amount of treatment days that would be suitable to link these entries, and several additional criteria to distinguish between what would be a different treatment from what would reflect a continuation of a treatment. In these stage, we defined rules to keep the most relevant information by each variable to collapse the intermediate events into a single entry that summaries the whole treatment and would let us distinguish posterior treatments.

Structure of Treatments and Rules to Collapse Continuous Entries

We got a general impression of the database to understand what would be the steps to follow to collapse entries into differentiated treatments. This is why we look at the relationship that the entries had with those that followed them.


#https://stackoverflow.com/questions/46750364/diagrammer-and-graphviz
#https://mikeyharper.uk/flowcharts-in-r-using-diagrammer/
#http://blog.nguyenvq.com/blog/2012/05/29/better-decision-tree-graphics-for-rpart-via-party-and-partykit/
#http://blog.nguyenvq.com/blog/2014/01/17/skeleton-to-create-fast-automatic-tree-diagrams-using-r-and-graphviz/
#https://cran.r-project.org/web/packages/DiagrammeR/vignettes/graphviz-mermaid.html
#https://stackoverflow.com/questions/39133058/how-to-use-graphviz-graphs-in-diagrammer-for-r
#https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781789802566/1/ch01lvl1sec21/creating-diagrams-via-the-diagrammer-package
#https://justlegal.be/2019/05/using-flowcharts-to-display-legal-procedures/
#   #   [3]:  paste0('Only applications w/ only one\\n application in the same date \\n(n = ', formatC(nrow(CONS_C1_df_dup_JUN_2020), format='f', big.mark=',', digits=0), ';\\n users:',formatC(CONS_C1_df_dup_JUN_2020%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')
   #   [4]:  paste0('Dataset \\n(n = ',formatC(comb_datasets_a_n,format='f', big.mark=',', digits=0),'\\nusers =',formatC(comb_datasets_a_users,format='f', big.mark=',', digits=0),')')
   #   [5]:  paste0('Dataset \\n(n = ',formatC(comb_datasets_b_n,format='f', big.mark=',', digits=0),'\\nusers =',formatC(comb_datasets_b_users,format='f', big.mark=',', digits=0),')')
   #   [6]:  paste0('Dataset \\n(n = ',formatC(comb_datasets_c_n,format='f', big.mark=',', digits=0),'\\nusers =',formatC(comb_datasets_c_users,format='f', big.mark=',', digits=0),')')
   #   [7]:  paste0('Final Sample \\n(n = ', formatC(nrow(CONS_C1_df_dup_JUN_2020_match_top_sel), format='f', big.mark=',', digits=0), ';\\n users: ',formatC(CONS_C1_df_dup_JUN_2020_match_top_sel%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')
#
#    #  tab3 [label = '@@3']
    #  tab7 [label = '@@7']
   #  blank [label = '', width = 0.001, height = 0.001]
#
#    # blank -> tab3[ dir = none,  color = 'white',fontcolor = white,shape=none, width=0, height=0];
    # tab3 -> tab4 [label=paste0('Some users had events fullfilling both conditions (n=',tab6_lab_users+tab5_lab_users-tab4_lab_users')',fontsize = 9];
    #  tab6 -> tab7 [label='  Only rows with available data on TOP scores and Diagnostic of CIE-10',fontsize = 9];
tab1_lab<- paste0('C1 Dataset \n(n = ', formatC(nrow(CONS_C1_df_dup_JUN_2020), format='f', big.mark=',', digits=0), ';\nusers: ',formatC(CONS_C1_df_dup_JUN_2020%>% dplyr::distinct(hash_key)%>% nrow(), format='f', big.mark=',', digits=0),')')

tab2_lab<-paste0('Cases of users that had at least two entries \n(n = ', CONS_C1_df_dup_JUN_2020%>% dplyr::group_by(hash_key)%>%   dplyr::mutate(sum_validos=sum(!is.na(diff_bet_treat)))%>%
  ungroup()%>%  dplyr::filter(sum_validos>0)%>% nrow()%>% formatC(big.mark=","),';\nusers =',  CONS_C1_df_dup_JUN_2020%>% dplyr::group_by(hash_key)%>%   dplyr::mutate(sum_validos=sum(!is.na(diff_bet_treat)))%>%  ungroup()%>%  dplyr::filter(sum_validos>0)%>% distinct(hash_key)%>% nrow()%>% formatC(big.mark=","),')')

tab3_lab<-paste0('Only entries w/ an entry\n that followed another one \n(n = ', CONS_C1_df_dup_JUN_2020%>% dplyr::filter(!is.na(diff_bet_treat))%>%nrow()%>% formatC(big.mark=","),';\nusers =',CONS_C1_df_dup_JUN_2020%>% dplyr::filter(!is.na(diff_bet_treat))%>%distinct(hash_key)%>% nrow()%>% formatC(big.mark=","),')')

            tab4_lab_n<-CONS_C1_df_dup_JUN_2020%>% 
              #dplyr::filter(!is.na(diff_bet_treat))%>%
              dplyr::mutate(filter_complex= dplyr::case_when(diff_bet_treat<45& as.numeric(motivoegreso_derivacion)==2~1,TRUE~0))%>%
              dplyr::mutate(filter_complex2= dplyr::case_when(diff_bet_treat<60& as.numeric(motivoegreso_derivacion)==1~1,TRUE~0))%>%
              dplyr::filter(filter_complex==1|filter_complex2==1)%>%
              #dplyr::select(hash_key,motivoegreso_derivacion,diff_bet_treat)
              nrow()
            
            tab4_lab_users<-CONS_C1_df_dup_JUN_2020%>% 
              #dplyr::filter(!is.na(diff_bet_treat))%>%
              dplyr::mutate(filter_complex= dplyr::case_when(diff_bet_treat<45& as.numeric(motivoegreso_derivacion)==2~1,TRUE~0))%>%
              dplyr::mutate(filter_complex2= dplyr::case_when(diff_bet_treat<60& as.numeric(motivoegreso_derivacion)==1~1,TRUE~0))%>%
              dplyr::filter(filter_complex==1|filter_complex2==1)%>%
              #dplyr::select(hash_key,motivoegreso_derivacion,diff_bet_treat)
              distinct(hash_key)%>% nrow()
            
            tab5_lab_n<-CONS_C1_df_dup_JUN_2020%>% 
              #dplyr::filter(!is.na(diff_bet_treat))%>%
              dplyr::mutate(filter_complex= dplyr::case_when(diff_bet_treat>45& as.numeric(motivoegreso_derivacion)==2~1,TRUE~0))%>%
              dplyr::mutate(filter_complex2= dplyr::case_when(diff_bet_treat<60& as.numeric(motivoegreso_derivacion)==1~1,TRUE~0))%>%
              dplyr::filter(filter_complex==1)%>%
              #dplyr::select(hash_key,motivoegreso_derivacion,diff_bet_treat)
              nrow()
            
            tab5_lab_users<-CONS_C1_df_dup_JUN_2020%>% 
              #dplyr::filter(!is.na(diff_bet_treat))%>%
              dplyr::mutate(filter_complex= dplyr::case_when(diff_bet_treat<45& as.numeric(motivoegreso_derivacion)==2~1,TRUE~0))%>%
              dplyr::mutate(filter_complex2= dplyr::case_when(diff_bet_treat<60& as.numeric(motivoegreso_derivacion)==1~1,TRUE~0))%>%
              dplyr::filter(filter_complex==1)%>%
              #dplyr::select(hash_key,motivoegreso_derivacion,diff_bet_treat)
              distinct(hash_key)%>% nrow()
            
            tab6_lab_n<-CONS_C1_df_dup_JUN_2020%>% 
              #dplyr::filter(!is.na(diff_bet_treat))%>%
              dplyr::mutate(filter_complex= dplyr::case_when(diff_bet_treat<45& as.numeric(motivoegreso_derivacion)==2~1,TRUE~0))%>%
              dplyr::mutate(filter_complex2= dplyr::case_when(diff_bet_treat<60& as.numeric(motivoegreso_derivacion)==1~1,TRUE~0))%>%
              dplyr::filter(filter_complex2==1)%>%
              #dplyr::select(hash_key,motivoegreso_derivacion,diff_bet_treat)
              nrow()
            
            tab6_lab_users<-CONS_C1_df_dup_JUN_2020%>% 
              #dplyr::filter(!is.na(diff_bet_treat))%>%
              dplyr::mutate(filter_complex= dplyr::case_when(diff_bet_treat<45& as.numeric(motivoegreso_derivacion)==2~1,TRUE~0))%>%
              dplyr::mutate(filter_complex2= dplyr::case_when(diff_bet_treat<60& as.numeric(motivoegreso_derivacion)==1~1,TRUE~0))%>%
              dplyr::filter(filter_complex2==1)%>%
              #dplyr::select(hash_key,motivoegreso_derivacion,diff_bet_treat)
              distinct(hash_key)%>% nrow()
            tab7_lab<- paste0('* Some users had events fullfilling both conditions (n=',tab6_lab_users+tab5_lab_users-tab4_lab_users,')')

tab4_lab<-paste0('Only entries w/ an entry\n that followed another one\n(both conditions)\n(n = ', tab4_lab_n%>% formatC(big.mark=","),';\nusers =',tab4_lab_users%>% formatC(big.mark=","),')*')

tab5_lab<-paste0('Only entries w/ an entry\n that followed another one \n(< 45 days of difference w/ a posterior entry &\nReferral as a cause of discharge)\n(n = ', tab5_lab_n%>% formatC(big.mark=","),';\nusers =',tab5_lab_users%>% formatC(big.mark=","),')')

tab6_lab<-paste0('Only entries w/ an entry\n that followed another one \n(< 60 days of difference w/ a posterior entry &\nNot a Referral as a cause of discharge)\n(n = ', tab6_lab_n%>% formatC(big.mark=","),';\nusers =',tab6_lab_users%>% formatC(big.mark=","),')')
          
DiagrammeR::grViz("
digraph graph2 {

graph [layout = dot]

# node definitions with substituted label text
node [shape = rectangle, width = 5, fillcolor = Biege]
a [label = '@@1']
b [label = '@@2']
c [label = '@@3']
d [label = '@@4']
e [label = '@@5', fontcolor = MidnightBlue, color = MidnightBlue]
f [label = '@@6']
g [label = '@@7', width = 0.001, height = 0.001, color=White]

a -> b 
b -> c 
c -> d #[label= paste0('** Some users had events fullfilling both conditions (n=',tab6_lab_users+tab5_lab_users-tab4_lab_users,')'),fontsize = 9];
d -> {e f} 
{e f} -> g [ dir = none,  color = 'white',fontcolor = white,shape=none, width=0, height=0];

}

[1]:  tab1_lab
[2]:  tab2_lab
[3]:  tab3_lab
[4]:  tab4_lab
[5]:  tab5_lab
[6]:  tab6_lab
[7]:  tab7_lab
")

Figure 1. Decision Tree for the Users with more than one entry

#[label=paste0('Some users had events fullfilling both conditions (n=',tab6_lab_users+tab5_lab_users-tab4_lab_users,')',fontsize = 9];


As seen in Figure 1, we could define that this pairs of events with the same users that could correspond to a continuous treatment, rather than different ones. We focused in these patterns to collapse them into treatments, particularly those related to cases with referrals in the first entry and less than 45 days of difference with a posterior entry.


invisible(c("1. Que las derivaciones hayan terminado siendo dervidados"))
invisible(c("2. Referral falsos, cuando el primer tratamiento es un traspaso perfecto y es considerado como un senda no"))
invisible(c("3. Ver uno a uno los casos que tienen 1 día de tratamiento y que uno es SENDA No y el otro SENDA Sí"))
invisible(c("4. colapsarse en un registro único aquellos registros de usuarios en común que presenten una diferencia menor a 45 para derivaciones y 60 días para el resto de motivos de egreso, entre la fecha de egreso y la fecha de ingreso al siguiente tratamiento (dependiendo de lo que acordemos), y en los que el único cambio registrado entre un tratamiento y otro sea el cambio del ID del centro."))
invisible(c("5. Generar variable con tratamientos concatenados"))
invisible(c("6. Qué hago con los tratamientos con NAs en fecha de egreso. Debiese borrarlos"))

invisible(c("derivaciones que cuenten con un tratamiento posterior, agrupar las entradas que tengan una diferencia menor o igual a 45 días"))

 #CONS_C1_df_dup_JUN_2020%>% dplyr::mutate(motivodeegreso_mod_imp_tidy= case_when(!is.na(diff_bet_treat) & as.character(motivodeegreso_mod_imp)=="Derivación" & grepl("Clínica",tipo_centro_derivacion==<90~"Abandono Temprano" , TRUE~as.character(motivodeegreso_mod_imp))%>% nrow()

#- si menor
#se puede pensar que un abandono tardío en verdad puede abarcar al menos 1 mes de tratamiento. Following the criteria stated in the annex and the terminological glossary, 
#las derivaciones deberían abarcar hasta 45 días.
#
#si hay más de 1095 días 

#menor_60_dias_diff
#motivoegreso_derivacion
#obs_cambios_ninguno

#tiene casos inválidos
CONS_C1_df_dup_JUN_2020%>% 
  dplyr::filter(!is.na(diff_bet_treat))%>%
#  dplyr::group_by(hash_key)%>%
#  dplyr::mutate(sum_validos=sum(!is.na(diff_bet_treat)))%>%
#  ungroup()%>%
#  dplyr::filter(sum_validos>0)%>%
  dplyr::mutate(menor_45_dias_diff=ifelse(diff_bet_treat<45,1,0))%>%
  janitor::tabyl(menor_45_dias_diff,motivoegreso_derivacion)%>%
  adorn_totals("col") %>%
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns()%>%
  knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption="Table 1. Diff. in Treatments <45 days, by Referral (only cases that had an entry after another one)",
               align= c("l",rep('c', 5)), col.names = c("Diff. in Treatments <45 days","Not a Referral","Referral", "Total"))%>%
  
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 8)%>%
        kableExtra::add_footnote(paste0("Note= Percentages by Column; Cases with an entry that follows them (n= ",CONS_C1_df_dup_JUN_2020%>% dplyr::filter(!is.na(diff_bet_treat))%>%nrow()%>% formatC(big.mark=","),"; users=",CONS_C1_df_dup_JUN_2020%>% dplyr::filter(!is.na(diff_bet_treat))%>%distinct(hash_key)%>% nrow()%>% formatC(big.mark=","),")"), notation = "none")%>%
  kableExtra::scroll_box(width= "100%", height = "250x")
Table 1. Diff. in Treatments <45 days, by Referral (only cases that had an entry after another one)
Diff. in Treatments <45 days Not a Referral Referral Total
0 86.4% (18772) 31.2% (3084) 69.1% (21856)
1 13.6% (2944) 68.8% (6809) 30.9% (9753)
Note= Percentages by Column; Cases with an entry that follows them (n= 31,609; users=20,524)


As seen in the Table above, most of the referrals that had a posterior treatment had a difference of 45 days or less, compared to other causes of admission. Considering that, we decided to get an impression over the amount of time that took to report another entry within users that had different causes of discharge in a previous treatment.


#http://rstudio-pubs-static.s3.amazonaws.com/316989_83cbe556125645b698c9ff6cf88c4c1a.html
#https://thriv.github.io/biodatasci2018/r-survival.html
#http://si.biostat.washington.edu/sites/default/files/modules/SISCR_2018_11_all-2pp_0.pdf
#https://www.researchgate.net/profile/Claudia_Castro-Kuriss/publication/325390160_Analisis_de_Supervivencia_mediante_el_empleo_de_R/links/5b0aba27a6fdcc8c25333860/Analisis-de-Supervivencia-mediante-el-empleo-de-R.pdf?origin=publication_detail
#http://www.sthda.com/english/wiki/survival-analysis-basics

#SURVIVAL= Explores factors that are thought to influence the chance that the event occurs
#Datos censurados= pueden ser por distintas causas:
    #- El paciente no refirió un evento (la readmissión) durante el estudio, y no sabemos si el evento ocurrió después. ESTOS SON LOS QUE TENGO QUE DARLES UN DIFF TREAT HASTA EL DIA DE HOY. SIEMPRE Y CUANDO TENGAN MENOS DE 1095 DIAS PARA PERDIDOS EN FECHA DE EGRESO, Y NO ESTÉN TRUNCADOS A LA DERECHA PORQUE NO SE LES TERMINÓ EL PRIMER TRAT.
    #Esta censura puede ocurrir cuando un usuario abandona un estudio, se pierde el seguimiento o no experimenta el evento una vez finaliza el estudio
    #- Truncado a la derecha: quien se perdió por una razón. Truncado a la derecha
    #Las muestras con censura aleatoria se consideran generalmente censuradas por derecha debido a que se van incorporando progresivamente los tiempos de fallas de distintas unidades
    #Los eventos que no experimentaron el evento en el tiempo de estudio se les censurará hasta el último tiempo de registro
    # Una suposición menos restrictiva que la suposición de independencia entre Ci y Ti, pero que alcanza para que los métodos sean válidos, es “la censura independiente” o “censura no informativa”:la probabilidad de que un individuo sea censurado en el instante t0 no depende de que ese individuo tenga inusualmente alto (o bajo) riesgo de evento.
    #Censoring may arise in the following ways:
    ###a patient has not (yet) experienced the event of interest, such as relapse or death, within the study time period;
    ###a patient is lost to follow-up during the study period;
    ###a patient experiences a different event that makes further follow-up impossible.
    #This type of censoring, named right censoring, is handled in survival analysis.

#– Recurrence rate
survfit_days_new_treat<-survfit(Surv(diff_bet_treat, status) ~ motivodeegreso_mod_imp, 
                                data=CONS_C1_df_dup_JUN_2020%>% 
                                  dplyr::mutate(diff_nas_fech_egres= as.numeric(difftime(lubridate::ymd("2019-11-13"),fech_ing, units = "days")))%>%
                                  #dplyr::filter(is.na(fech_egres_imp))%>% dplyr::select(fech_ing,fech_egres_imp,diff_nas_fech_egres)
                                  dplyr::mutate(perdi_seguimiento=dplyr::case_when(is.na(fech_egres_imp)&diff_nas_fech_egres>=1095~1,TRUE~0))%>% 
                                  dplyr::filter(perdi_seguimiento==0)%>% #NI SIQUIERA TERMINARON EL PIMER EVNTO, Y LO MAS PROBABLE ES QUE NUNCA REGISTRARON FECHA DE TÉRMINO. ESTPS SI QUE SI DEBO SACARLOS.
                                  dplyr::mutate(no_tienen_ni_el_primer_evento=dplyr::case_when(is.na(fech_egres_imp)&diff_nas_fech_egres<1095~1,
                                                                                     TRUE~0))%>% 
                                 # dplyr::filter(no_tienen_ni_el_primer_evento==0)%>%#NO HAN TERMINADO EL TRATAMIENTO. CENSURA SIMPLE TIPO 1, PERO SE DIFERENCIA DE LOS QUE NUNCA LLEGARON SIQUIERA A TENER EL PRIMER EVENTO. POR ESO A ESOS CASOS DEBO SACARLOS. AUNQUE NO ESTOY SEGURO, PORQUE PUEDE QUE ESTOS CASOSO TAMBIÉN FORMEN PARTE ED LA CENSURA AUTOMATICA QUE HACE R.
                                  #LOS QUE TIENEN 
                                  dplyr::mutate(status=dplyr::case_when(!is.na(diff_bet_treat)~1,TRUE~0)), #censurar si no tienen fechas entre trat porque no tienen un siguiente
                                    #mutate(status=dplyr::case_when(!is.na(fech_egres_imp)~1,TRUE~0)), #censurar fechas de egreso ##se supone q este es más puro, no sé
                                type = "kaplan-meier", #The Kaplan-Meier curve is a nonparametric estimator of the survival distribution (i.e. the “estimation” component of the “test/estimation” approach to analysis of time-to-event data)
                                error = "tsiatis", conf.type = "log-log", conf.int = 0.95)
#So we only know that the patient survived AT LEAST 13 months, but we have no other information available about the patient's status.  This type of censoring (also known as "right censoring") makes linear regression an inappropriate way to analyze the data due to censoring bias.

#simple
#survfit_days_new_treat_simple<-survfit(Surv(diff_bet_treat, status) ~ motivodeegreso_mod_imp, 
#                                data=CONS_C1_df_dup_JUN_2020%>% mutate(status=dplyr::case_when(!is.na(diff_bet_treat)~1,TRUE~0))%>% data.frame())

#Utilizando esta información se compara si existe alguna diferencia de las curvas de supervivencia entre los estados 
#In order to determine if there is a statistically significant difference between the survival curves, we perform what is known as a log-rank test, which tests the following hypothesis:
##H0: There is no difference in the survival function between those who were on maintenance chemotherapy and those who weren't on maintenance chemotherapy.
##Ha: There is a difference in the survival function between those who were on maintenance chemotherapy and those who weren't on maintenance chemotherapy.
#También con la orden “survdiff”, podemos realizar un test de hipótesis no paramétrico que nos diga si la diferencia de la probabilidad de supervivencia entre subgrupos es significativa o no. En este caso lo sería, al obtener un p-value < 0,05, experimentando esas diferencias en las zonas Centro y Sur, que sería en donde deberíamos de realizar un estudio más en profundidad.
  invisible(  
  survdiff(Surv(diff_bet_treat, status) ~ motivodeegreso_mod_imp, data=CONS_C1_df_dup_JUN_2020%>% mutate(status=dplyr::case_when(!is.na(diff_bet_treat)~1,TRUE~0)), rho = 0)  
  )
  #Prueba log-rank
  invisible(  
  survdiff(Surv(diff_bet_treat, status) ~ motivodeegreso_mod_imp, data=CONS_C1_df_dup_JUN_2020%>% mutate(status=dplyr::case_when(!is.na(diff_bet_treat)~1,TRUE~0)), rho = 1) 
  )

#survfit_days_new_treat_simple
survfit_days_new_treat_dataframe<-summary(survfit_days_new_treat, times=seq(0, 3500, 100), print.rmean=T,digits=2)
#
data.table(survfit_days_new_treat_dataframe$table,keep.rownames = T)%>%
  knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
               caption="Table 2. Estimates related to the probability that an entry kept free of a posterior one",
               align= c("l",rep('c', 5)), col.names = c("Cause of Discharge","Records","n.max", "n.start","events","rmean","se(rmean)","median", "95%CI Lower","95%CI Upper"))%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 8)%>%
        kableExtra::add_footnote(paste0("Note= Treatments that did not finished their first treatment were discarded (n=",CONS_C1_df_dup_JUN_2020%>% 
    dplyr::mutate(diff_nas_fech_egres= as.numeric(difftime(lubridate::ymd("2019-11-13"),fech_ing, units = "days")))%>%dplyr::mutate(perdi_seguimiento=dplyr::case_when(is.na(fech_egres_imp)&diff_nas_fech_egres>=1095~1,TRUE~0))%>%dplyr::filter(perdi_seguimiento==1)%>% nrow()%>% formatC(big.mark=","),"); Excluded cases with no cause of discharge (n=",CONS_C1_df_dup_JUN_2020%>% dplyr::mutate(diff_nas_fech_egres= as.numeric(difftime(lubridate::ymd("2019-11-13"),fech_ing, units = "days")))%>% dplyr::mutate(perdi_seguimiento=dplyr::case_when(is.na(fech_egres_imp)&diff_nas_fech_egres>=1095~1,TRUE~0))%>%  dplyr::filter(perdi_seguimiento==0)%>% dplyr::mutate(status=dplyr::case_when(!is.na(diff_bet_treat)~1,TRUE~0))%>% dplyr::filter(!is.na(diff_bet_treat),is.na(motivodeegreso_mod_imp))%>% nrow() %>% formatC(big.mark=","),")"), notation = "none")%>%
  kableExtra::scroll_box(width= "100%", height = "250x")
Table 2. Estimates related to the probability that an entry kept free of a posterior one
Cause of Discharge Records n.max n.start events rmean se(rmean) median 95%CI Lower 95%CI Upper
motivodeegreso_mod_imp=Abandono Tardio 9,080 9,080 9,080 9,080 578.4441 6.264521 374 357 386
motivodeegreso_mod_imp=Abandono Temprano 4,803 4,803 4,803 4,803 507.1576 8.337565 295 283 309
motivodeegreso_mod_imp=Alta Administrativa 2,998 2,998 2,998 2,998 496.9633 11.010819 262 241 286
motivodeegreso_mod_imp=Alta Terapéutica 4,826 4,826 4,826 4,826 601.5468 9.021110 383 363 401
motivodeegreso_mod_imp=Derivación 9,893 9,893 9,893 9,893 165.2633 3.933942 5 4 5
Note= Treatments that did not finished their first treatment were discarded (n=264); Excluded cases with no cause of discharge (n=9)
#median time to event (the time when half the records have an event).
#Even if median survival has been reached in a group, it might not be possible to calculate complete confidence intervals for those median values,
# just knowing the difference in median survival values doesn't necessarily tell you which is better for prognosis--then you have to specify which prognosis time you care about.
#The restricted mean (rmean) and its standard error se(rmean) are based on a truncated estimator. When the last censoring time is not random this quantity is occasionally of interest.
#El estimador de S es lo que se llama curva de supervivencia (“survival curve”). 
event="no"
if(event=="si"){
plot(mfit2, col=c(1,2,1,2), lty=c(2,2,1,1),
     mark.time=FALSE, lwd=2, xscale=12,
     xlab="Years post diagnosis", ylab="Probability in State")
legend(3000, .6, c("death:female", "death:male", "pcm:female", "pcm:male"),
         col=c(1,2,1,2), lty=c(1,1,2,2), lwd=2, bty='n')
}
plot(survfit_days_new_treat,
         xlab = "Days of difference with a posterior treatment",  conf.int = T,mark.time = F,
     ylab = "Ssurvival probability",
     col=c("yellow4","thistle","cornflowerblue","violetred3","gray20"), lwd=2) # 
legend("topright", c("Late Withdrawal", "Early Withdrawal", "Administrative Discharge", "Therapeutic Discharge","Referral"),
         col=c("yellow4","thistle","cornflowerblue","violetred3","gray20"), lty=c(1,1,1,1), lwd=2, bty='n')
mtext("Note. Users who did not finish their first treatment or did not show recurrence have been censored", side=1,size=.5,cex=.7,outer=F,at=1500,4)
Figure 2. Recurrence-free interval of a treatment according to cause of discharge of the first treatment

Figure 2. Recurrence-free interval of a treatment according to cause of discharge of the first treatment


From the Figure above, we can interpret that referrals had most entries with 0’s or a minimum time with a posterior one. But how many entries could users had?, we generated a histogram distinguishing those cases that summed no more than 0 days of differences between entries (possibly part of the same treatment with only minor changes), from those with more days of treatment


c26 <- c(
  "dodgerblue2", "#E31A1C", # red
  "green4",
  "#6A3D9A", # purple
  "#FF7F00", # orange
  "gray16", "gold1",
  "skyblue2", "#FB9A99", # lt pink
  "palegreen2",
  "#CAB2D6", # lt purple
  "#FDBF6F", # lt orange
  "gray70", "khaki2",
  "maroon", "orchid1", "deeppink1", "blue1", "steelblue4",
  "darkturquoise", "green1", "yellow4", "yellow3",
  "darkorange4", "brown", "gray40")
c28 <- c(
  "dodgerblue2", "#E31A1C",  "green4",  "#6A3D9A",  "#FF7F00", "gray16", "gold1", "skyblue2", "#FB9A99",  "palegreen2","orchid1", "#CAB2D6", # lt purple
  "#FDBF6F",  "gray70","deeppink1", "khaki2","steelblue4",  "maroon",  "blue1", "brown",  "darkturquoise", "green1", "yellow4", "yellow3","pink",
  "darkorange4",  "gray40", "blue","black","red","green", "orange", "white", "blue4", "violet")

get_distinct_hues <- function(ncolor,s=0.5,v=0.95,seed=350) {
  golden_ratio_conjugate <- 0.618033988749895
  set.seed(seed)
  h <- runif(1)
  H <- vector("numeric",ncolor)
  for(i in seq_len(ncolor)) {
    h <- (h + golden_ratio_conjugate) %% 1
    H[i] <- h
  }
  hsv(H,s=s,v=v)
}
p3_1<-CONS_C1_df_dup_JUN_2020%>%
  dplyr::filter(!is.na(diff_bet_treat))%>%
ggplot(aes()) + 
  geom_segment(aes(x = as.POSIXct(as.Date(fech_ing)), xend = as.POSIXct(as.Date(fech_egres_imp)),
                   y = hash_key, yend = hash_key,colour=as.factor(row),size=1/100)) + 
    scale_x_datetime(breaks=scales::date_breaks("1 year"), 
                  limits = as.POSIXct(c('2010-01-01 09:00:00','2020-01-01 09:00:00')),
                  labels = scales::date_format("%m/%y")) +
 # scale_color_manual(values=get_distinct_hues(31609)) +
  theme(axis.line=element_blank(),
          axis.ticks=element_blank(),axis.title.y=element_text("HASHs"),axis.text.y=element_blank(),
          axis.title.x=element_text(""),legend.position="none",
          panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
          panel.grid.minor=element_blank(),plot.background=element_blank(), plot.title = element_text(hjust = 0))+
  scale_size_identity()+ ##para cambiar el ancho de cada segmento
  #scale_x_date(breaks = scales::date_breaks("1 year"), date_labels = "%b %d") +
    theme(plot.caption = element_text(face= "italic",hjust = 0)) +
    labs(x = "Dates of admission and discharge", y="HASHs", 
         caption="Example of 4 clean trajectories. Colored lines represent different rows in the dataset, but same HASH")
ggplotly(p3_1)
p3<-CONS_C1_df_dup_JUN_2020%>%
  #dplyr::filter(!is.na(diff_bet_treat))%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(sum_validador=sum(diff_bet_treat,na.rm=T), 
                n=n(),
                con_diff_dias=if_else(sum_validador>0,1,0,NA_real_))%>%
  distinct(hash_key,.keep_all=T)%>%
  dplyr::select(n,con_diff_dias)
  
  groupA <- p3 %>% filter(con_diff_dias == 1)
  groupB <- p3 %>% filter(con_diff_dias == 0)
  
#p3<-ggplot(p3,aes(x=n))+
#  geom_histogram_interactive()+
#  facet_wrap(~sin_diff_dias, labeller = as_labeller(c(`0` = "No differences between Treatments in Days", `1` = "Differences Between Treatments in Days")))+
#  sjPlot::theme_sjplot2()+
#    labs(x="",y="Frequencies ", x="No. of cases by user")+
# # xlim(c(2,10))+
#  ylim(c(0,15000))+
#scale_x_continuous(breaks = seq(from = 0, to = 13, by = 1))+
# theme(panel.grid.minor=element_blank(),
#       plot.background=element_blank(),
#        panel.background=element_blank(),
#       panel.border=element_blank(),
#       panel.grid.major=element_blank())+
#  labs(caption="Note. Only selected users with more than 1 case")

tooltip_css <- "background-color:gray;color:white;font-style:italic;padding:10px;border-radius:10px 20px 10px 20px;"

#ggiraph(code = {print(p3)}, tooltip_extra_css = tooltip_css, tooltip_opacity = .75 )

p3 <- plot_ly(alpha = 0.5) %>% 
  add_histogram(x = ~groupA$n,
                name = "Differences Between Treatments in Days") %>% 
    add_histogram(x = ~groupB$n,
                name = "No differences between Treatments in Days",
                marker = list(color = "rgba(150, 150, 150, 0.7)")) %>% 
  layout(barmode = "overlay",
         xaxis = list(title = "No. of cases by user",
                      
                      zeroline = T),
         yaxis = list(title = "Frequency of different users",
                      zeroline = T))%>%
layout(legend = list(orientation = "h",   # show entries horizontally
                     xanchor = "center",  # use center of legend as anchor
                     x = 0.5, y= -.04))  %>%
  config(displayModeBar = FALSE) %>%
layout(hovermode = 'compare')%>%
  layout(
    xaxis = list(
      dtick = 1, 
      tick0 = 1, 
      tickmode = "linear"
    )
  )
p3

Figure 3. Histogram of No. of Treatments depending on Sum of Diff Between Entries (if any)


As seen in the Figure above, the users that had only one entry represent the greater amount of people that had no cumulative days of difference between another entry because there were no other. This is why we focused on users that had more than one treatment, and we could highlight a small amount of users that had 2 entries with no days of difference between them, and a much smaller amount had three entries with no differences in days between them. Possibly, we could consider that these users only had one treatment with minor changes between a different entry. Conversely, most of the users that had days of difference in entries within them, had 2 cases and exceptionally 3 (notice that many of these users could have a treatment with differences of 0 days between another, but a third treatment with more days, leading to have a total difference of more than 0). Must note that some user had 13 entries.


Also, we considered necessary to get an overview of the distribution of changes by days (divided in 20 equal parts), and depending on the cause of discharge of the first entry, to get an impression of what are the major changes that are occurring between different entries of each user.


invisible(c("¿cuántos casos sin diferencias de tratamiento tienen por motivo de egreso"))

p11<-
  CONS_C1_df_dup_JUN_2020%>% 
    dplyr::filter(!is.na(diff_bet_treat), !is.na(motivodeegreso_mod_imp))%>%
    #dplyr::mutate(motivoegreso_derivacion=factor(motivoegreso_derivacion, levels = c("Motivo: Otro", "Motivo: Derivación","NA")))%>%
    
    dplyr::mutate(diff_bet_treat_bar=round(diff_bet_treat,0)) %>%
    dplyr::mutate(diff_bet_treat_bar=cut2(diff_bet_treat_bar, g =20))%>%
    
    dplyr::mutate(grupo_var=factor(obs_cambios_num))%>%
    dplyr::group_by(diff_bet_treat_bar, grupo_var,motivodeegreso_mod_imp)%>%
    summarise(n_2_grupos=n())%>%
    dplyr::ungroup()%>%
    dplyr::group_by(motivodeegreso_mod_imp,diff_bet_treat_bar)%>%
    dplyr::mutate(total_n=sum(n_2_grupos))%>%
    dplyr::mutate(freq = (n_2_grupos / total_n))%>%
    dplyr::ungroup()%>%
    dplyr::group_by(motivodeegreso_mod_imp)%>%
    dplyr::mutate(total_n_solo_mot_egreso=sum(n_2_grupos))%>%
    dplyr::mutate(text=paste('% Cause Discharge by No. Days: ', scales::percent(freq,accuracy =0.01), '<br>', #formatC(positivos_acumulados, format="f", big.mark=",", digits=0)
                            'Cause of Discharge: ', motivodeegreso_mod_imp , '<br>',
                            'Total Frequency by Days: ', total_n , '<br>',
                            'Frequency of Cause of Discharge by No. Days: ',n_2_grupos, '<br>',
                            'No. Days:',diff_bet_treat_bar))%>%

    ggplot2::ggplot(aes(x = diff_bet_treat_bar, y = n_2_grupos,fill=grupo_var,text=text))+
    geom_bar(stat='identity', alpha=.8) + 
    scale_x_discrete()+
    scale_fill_manual(name= "No. of changes",values=c("cornsilk3", "lightskyblue2", "#56B4E9", "steelblue","slategray4")) +
    sjPlot::theme_sjplot2()+
    labs(x="",y="", fill="No. of changes")+
    #ylim(0,101)+
    #scale_y_continuous(limits=c(0,1),labels = scales::percent) +
    theme(legend.position="bottom")+
    guides(fill=guide_legend(title="No. of changes",ncol=5))+
    theme(legend.text = element_text(size=9))+
    theme(panel.grid.minor = element_blank(), 
          panel.grid.major = element_blank(), 
          panel.grid.major.x = element_blank(),
          panel.background = element_blank(),
          axis.title.x = element_blank())+
    theme(axis.text.x = element_text(vjust = 0.5,hjust = 0.5,angle = 90, size= 6.5), plot.caption= element_text(hjust=0))+
    facet_wrap(~motivodeegreso_mod_imp, ncol=3, labeller = as_labeller(c(`Abandono Tardio` = "Late Withdrawal", `Abandono Temprano` = "Early Withdrawal",`Alta Administrativa` = "Administrative Discharge",`Alta Terapéutica` = "Therapeutic Discharge",`Derivación` = "Referral")), strip.position = "right")+ 
    geom_vline(aes(xintercept = 45), 
               linetype = "dashed", colour = "red",size = 1)+
    labs(fill="No. of changes",caption=paste0("Note. ",CONS_C1_df_dup_JUN_2020%>% dplyr::filter(is.na(diff_bet_treat))%>% nrow() %>% formatC(big.mark = ",")," obs. had missing data that corresponded to unique treatments by users;\nDays of treatment were divided en 35 equal parts"))+
    theme(strip.background =element_rect(colour=NA,fill=NA, size=3.5))+
    theme(strip.text = element_text(colour = 'gray60', size=8),
          plot.caption= element_text(size=7))+
    theme(legend.title = element_text(colour = 'gray30', size=8))+
  theme(
    strip.text.x = element_text(margin = margin(10, 0, 10, 0))
  ) 
ggplotly(p11, tooltip = "text")%>%
    layout(legend = list(title= list(text = "Changes in SENDA,Center,Program or Plan"),
                         orientation = "h",   # show entries horizontally
                         xanchor = "center",  # use center of legend as anchor
                         x = 0.5, y=-0.09)) %>%
  config(displayModeBar = FALSE) %>%
layout(hovermode = 'compare') 

Figure 4. Distribution of No. of Changes by Sum of Diff Between Entries Depending on Cause of Discharge of the first entry


From the Table above, we can add support to the fact that referrals had a great amount of posterior entries, compared to the other causes of discharge. In the entries with referrals as a cause of discharge, the first five bars represented most of the entries with posterior ones. Notably, these entries experienced a lot of changes in each following treatment (around 2 or 3).


Collapse Continuous or Almost Continuous Entries into Treatments

We decided to collapse the different entries into a single treatment. This required to adopt different strategies to collapse values of variables of different types and characteristics.


##f     #Tratamiento más largo //#g- Replacedfavored dgs.-a // #h     #Sum values x_se_trata_mujer_emb_n  usuario_tribunal_trat_droga_n discapacidad_n ha_estado_embarazada_egreso_n tiene_menores_de_edad_a_cargo_n
disc_lab<- paste0('* Some variables were transformed in different formats)')
#via_adm_sus_prin sus_ini origen_ingreso
DiagrammeR::grViz(
  "digraph structs {
    node [shape=record];
    struct [label='<f1> Wide format(a)|<f2> Maximum/Last value(b)|<f3> Minimum/First value(c)|<f4> Kept more vulnerable category(d)|<f5> Same value(e)|<f6>Largest treatment(f)|<f8> Favored dgs.-a(g)|<f12> Sum values(h)'];
    struct_f1 [label='{row|nombre_centro|tipo_centro|servicio_de_salud|senda|id_centro|obs*}'];
    struct_f2 [label='{numero_de_hijos_mod**|num_hijos_trat_res_mod**|tipo_centro_derivacion|fech_egres_imp|motivodeegreso_mod_imp|macrozona|nombre_region|comuna_residencia_cod|identidad_de_genero**|tipo_de_plan_2*|id_centro*|ano_bd*}'];
    struct_f3 [label='{fech_ing|fecha_ingreso_a_convenio_senda|edad_al_ing|origen_ingreso_mod|embarazo|edad_al_ing_grupos|ano_bd*}'];
    struct_f4 [label='{escolaridad|compromiso_biopsicosocial|dg_global_nec_int_soc_or|dg_nec_int_soc_cap_hum_or|dg_nec_int_soc_cap_fis_or|dg_nec_int_soc_cap_soc_or|evaluacindelprocesoteraputico|eva_consumo|eva_fam|eva_relinterp|eva_ocupacion|eva_sm|eva_fisica|eva_transgnorma|dg_trs_psiq_cie_10_egres_or|dg_global_nec_int_soc_or_1|dg_nec_int_soc_cap_hum_or_1|dg_nec_int_soc_cap_fis_or_1|dg_nec_int_soc_cap_soc_or_1|tiene_menores_de_edad_a_cargo|x_se_trata_mujer_emb|usuario_tribunal_trat_droga|ha_estado_embarazada_egreso|dg_trs_cons_sus_or|opcion_discapacidad}']; 
    struct_f5 [label='{hash_key|id|hash_rut_completo|nacionalidad|sexo_2|id_mod|obs*|fech_nac|edad_ini_cons|edad_ini_sus_prin|estado_conyugal_2|edad_grupos|etnia_cor|nacionalidad_2|etnia_cor_2|sus_ini_mod_2|sus_ini_mod_3|sus_ini_mod|at_least_one_cont_entry}'];
    struct_f6 [label='{con_quien_vive|tipo_de_plan_2*|estatus_ocupacional|cat_ocupacional|tipo_de_vivienda_mod|tenencia_de_la_vivienda_mod|rubro_trabaja_mod|sus_principal_mod|freq_cons_sus_prin|via_adm_sus_prin_act|otras_sus1_mod|otras_sus2_mod|otras_sus3_mod|tipo_de_programa_2}'];
    struct_f8 [label='{dg_trs_psiq_dsm_iv_or|dg_trs_psiq_sub_dsm_iv_or|x2_dg_trs_psiq_dsm_iv_or|x2_dg_trs_psiq_sub_dsm_iv_or|x3_dg_trs_psiq_dsm_iv_or|x3_dg_trs_psiq_sub_dsm_iv_or|dg_trs_psiq_cie_10_or|dg_trs_psiq_sub_cie_10_or|x2_dg_trs_psiq_cie_10_or|x2_dg_trs_psiq_sub_cie_10_or|x3_dg_trs_psiq_cie_10_or|x3_dg_trs_psiq_sub_cie_10_or|diagnostico_trs_fisico|otros_probl_at_sm_or}'];
    g [label = '* Some variables were transformed in different formats);** If not available, replaced with the last available', width = 0.001, height = 0.001, color=White];
    struct_f12 [label='{dias_trat_imp|dias_trat_inv}']; 
    struct:f1-> struct_f1;
    struct:f2-> struct_f2;
    struct:f3-> struct_f3;
    struct:f4-> struct_f4;
    struct:f5-> struct_f5;
    struct:f6-> struct_f6;
    struct:f8-> struct_f8;
    struct:f12-> struct_f12;
    struct_f12 -> g [ dir = none,  color = 'white',fontcolor = white,shape=none, width=0, height=0];
  }")

Figure 5. Criteria to Transform Variables

  #width=14, height=7)


We generated a subset of entries that had less than 45 days of difference with a posterior entry and a referral as a cause of discharge, and we also included this posterior entry that would mostly replace the values of these initial and intermediate entries.


Once we subsetted the entries, we found that many users had more than one entry that can be considered as part of a treatment.


invisible(c("2.el problema que tengo es que los filtros solo seleccionan como candidatos a la transformación a los casos intermedios, pero tengo casos al final que no van a cumplir con las condiciones. Eso es problemático con casos que tienen más de una entrada intermedia"))
invisible(c("3.creo que los puedo identificar ocupando un siguiente lag"))

invisible(c("1. si es menor a 45 días y es referral, seleccionar las filas q podrían ser absorvidas, más el sig tratamiento"))
CONS_C1_JUN_2020_row_sig_row<-
  CONS_C1_df_dup_JUN_2020%>%
      #dplyr::filter(!is.na(diff_bet_treat))%>% #31,609, no hay NAs en derivación, los que son NA son 0.
      dplyr::mutate(filter_complex= dplyr::case_when(!is.na(diff_bet_treat) & diff_bet_treat<45 & as.character(motivoegreso_derivacion)=="Referral"~1,TRUE~0))%>%
      dplyr::arrange(hash_key)%>%
      dplyr::group_by(hash_key)%>%
      dplyr::mutate(sig_row=lag(row), sig_fech_egres_imp=lag(fech_egres_imp),sig_motivoegres_ref=lag(motivoegreso_derivacion),n_por_hash=n())%>%
      ungroup()%>%
      dplyr::filter(filter_complex==1)%>%
      # dplyr::select(row,hash_key,fech_ing,fech_egres_imp,motivoegreso_derivacion,obs_cambios,diff_bet_treat,sig_row,n_por_hash) %>% View() #0007678b8b35fa0961d1e8110fbf9620 
      dplyr::select(row,sig_row)
#6,809*2 =   13,618

  CONS_C1_df_dup_JUN_2020%>%
    dplyr::filter(row %in% unlist(c(CONS_C1_JUN_2020_row_sig_row$row,CONS_C1_JUN_2020_row_sig_row$sig_row)))%>%
    dplyr::arrange(hash_key,desc(fech_ing))%>%
    dplyr::mutate(filter_complex_anterior= dplyr::case_when(!is.na(lag(diff_bet_treat)) & lag(diff_bet_treat)<45 & as.character(lag(motivoegreso_derivacion))=="Referral"~1,TRUE~0))%>%
    dplyr::mutate(filter_complex= dplyr::case_when(!is.na(diff_bet_treat) & diff_bet_treat<45 & as.character(motivoegreso_derivacion)=="Referral"~1,TRUE~0))%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(sum_validadores=sum(filter_complex))%>%
    dplyr::mutate(n_complex = str_count(filter_complex, '0'))%>%
    ungroup()%>%
    dplyr::mutate(cumsum_n_complex = cumsum(n_complex))%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(n_dist=n_distinct(cumsum_n_complex))%>%
    dplyr::filter(n_dist>3)%>%
    dplyr::select(hash_key,fech_ing,fech_egres_imp,motivodeegreso_mod_imp,diff_bet_treat,filter_complex,filter_complex_anterior,n_dist,cumsum_n_complex)%>%
    
    knitr::kable(format= "html", format.args= list(decimal.mark= ".", big.mark= ","),
                 caption="Table 3. Groups w/ more than 3 distinct groups of cases that fullfilled the conditions to categorize as continuous treatments by each user",
                 align= c("l",rep('c', 5)), col.names = c("User","Date of Admission","Date of Discharge", "Cause of Discharge", "Diff Between Treatments","<45 & Referral","<45 & Referral of Previous Treatment","No. groups within users","ID of groups"))%>%
    kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),font_size= 8)%>%
    kableExtra::kable_styling(full_width = F)%>%
          kableExtra::add_footnote(paste0("Note= Cases with an entry that had a referral as a cause of discharge and >45 days (n= ", CONS_C1_JUN_2020_row_sig_row%>% nrow()%>% formatC(big.mark=","),")"), notation = "none")
Table 3. Groups w/ more than 3 distinct groups of cases that fullfilled the conditions to categorize as continuous treatments by each user
User Date of Admission Date of Discharge Cause of Discharge Diff Between Treatments <45 & Referral <45 & Referral of Previous Treatment No. groups within users ID of groups
0f4aa2f78fa5da961404e6e5389ad76c 2017-04-03 2017-04-08 Abandono Temprano 2 0 1 4 352
0f4aa2f78fa5da961404e6e5389ad76c 2017-01-31 2017-03-29 Derivación 5 1 0 4 352
0f4aa2f78fa5da961404e6e5389ad76c 2016-05-27 2016-08-12 Abandono Temprano 172 0 1 4 353
0f4aa2f78fa5da961404e6e5389ad76c 2015-10-19 2016-05-23 Derivación 4 1 0 4 353
0f4aa2f78fa5da961404e6e5389ad76c 2015-07-14 2015-10-16 Derivación 3 1 1 4 353
0f4aa2f78fa5da961404e6e5389ad76c 2014-07-23 2014-08-11 Abandono Temprano 337 0 1 4 354
0f4aa2f78fa5da961404e6e5389ad76c 2014-07-08 2014-07-21 Derivación 2 1 0 4 354
0f4aa2f78fa5da961404e6e5389ad76c 2014-06-05 2014-07-01 Alta Terapéutica 7 0 1 4 355
0f4aa2f78fa5da961404e6e5389ad76c 2013-11-04 2014-06-05 Derivación 0 1 0 4 355
1173f19959cadd5542a584ab94ca87b7 2017-11-22 2018-05-08 Alta Administrativa 155 0 1 4 418
1173f19959cadd5542a584ab94ca87b7 2017-07-10 2017-11-21 Derivación 1 1 0 4 418
1173f19959cadd5542a584ab94ca87b7 2014-07-07 2016-04-13 Alta Terapéutica 453 0 1 4 419
1173f19959cadd5542a584ab94ca87b7 2014-04-04 2014-07-07 Derivación 0 1 0 4 419
1173f19959cadd5542a584ab94ca87b7 2013-04-26 2013-08-01 Derivación 246 0 1 4 420
1173f19959cadd5542a584ab94ca87b7 2013-01-23 2013-04-01 Derivación 25 1 0 4 420
1173f19959cadd5542a584ab94ca87b7 2012-04-02 2013-01-21 Derivación 2 1 1 4 420
1173f19959cadd5542a584ab94ca87b7 2012-02-15 2012-04-02 Abandono Temprano 0 0 1 4 421
1173f19959cadd5542a584ab94ca87b7 2011-09-22 2012-02-13 Derivación 2 1 0 4 421
25c36b6820ac514094c458ba22918452 2017-08-02 2018-02-19 Alta Terapéutica 194 0 1 4 940
25c36b6820ac514094c458ba22918452 2017-05-03 2017-08-01 Derivación 1 1 0 4 940
25c36b6820ac514094c458ba22918452 2016-08-18 2017-04-28 Alta Terapéutica 5 0 1 4 941
25c36b6820ac514094c458ba22918452 2016-05-27 2016-07-19 Derivación 30 1 0 4 941
25c36b6820ac514094c458ba22918452 2016-03-04 2016-05-22 Abandono Temprano 5 0 1 4 942
25c36b6820ac514094c458ba22918452 2016-01-04 2016-03-03 Derivación 1 1 0 4 942
25c36b6820ac514094c458ba22918452 2015-09-24 2015-12-31 Derivación 4 1 1 4 942
25c36b6820ac514094c458ba22918452 2011-01-31 2011-12-30 Alta Administrativa 279 0 1 4 943
25c36b6820ac514094c458ba22918452 2010-08-20 2010-12-22 Derivación 40 1 0 4 943
c81df65dbf73521d91ff7c65a3c7ceba 2015-04-13 2015-06-01 Abandono Temprano 36 0 1 4 4,783
c81df65dbf73521d91ff7c65a3c7ceba 2014-06-30 2015-04-07 Derivación 6 1 0 4 4,783
c81df65dbf73521d91ff7c65a3c7ceba 2013-12-17 2014-03-01 Abandono Temprano 80 0 1 4 4,784
c81df65dbf73521d91ff7c65a3c7ceba 2013-08-26 2013-12-17 Derivación 0 1 0 4 4,784
c81df65dbf73521d91ff7c65a3c7ceba 2013-03-22 2013-08-01 Abandono Tardio 25 0 1 4 4,785
c81df65dbf73521d91ff7c65a3c7ceba 2013-01-02 2013-03-11 Derivación 11 1 0 4 4,785
c81df65dbf73521d91ff7c65a3c7ceba 2012-08-20 2012-09-10 Abandono Temprano 114 0 1 4 4,786
c81df65dbf73521d91ff7c65a3c7ceba 2012-07-02 2012-08-18 Derivación 2 1 0 4 4,786
d375edb930e2d3f4517f2307200b1cf2 2019-09-03 NA NA NA 0 1 4 5,044
d375edb930e2d3f4517f2307200b1cf2 2019-07-15 2019-08-09 Derivación 25 1 0 4 5,044
d375edb930e2d3f4517f2307200b1cf2 2019-03-05 2019-03-29 Derivación 108 0 1 4 5,045
d375edb930e2d3f4517f2307200b1cf2 2018-12-27 2019-03-01 Derivación 4 1 0 4 5,045
d375edb930e2d3f4517f2307200b1cf2 2017-11-28 2018-01-08 Derivación 353 0 1 4 5,046
d375edb930e2d3f4517f2307200b1cf2 2017-04-17 2017-11-27 Derivación 1 1 0 4 5,046
d375edb930e2d3f4517f2307200b1cf2 2013-09-24 2013-10-18 Abandono Temprano 648 0 1 4 5,047
d375edb930e2d3f4517f2307200b1cf2 2013-06-13 2013-08-30 Derivación 25 1 0 4 5,047
e81d886539caa4e9b2527984cacd7ec0 2018-02-08 2018-04-09 Derivación NA 0 1 4 5,549
e81d886539caa4e9b2527984cacd7ec0 2017-10-10 2018-02-01 Derivación 7 1 0 4 5,549
e81d886539caa4e9b2527984cacd7ec0 2016-08-16 2017-01-01 Abandono Tardio 282 0 1 4 5,550
e81d886539caa4e9b2527984cacd7ec0 2016-03-14 2016-08-10 Derivación 6 1 0 4 5,550
e81d886539caa4e9b2527984cacd7ec0 2016-01-19 2016-03-10 Derivación 4 1 1 4 5,550
e81d886539caa4e9b2527984cacd7ec0 2015-02-24 2015-06-02 Alta Terapéutica 231 0 1 4 5,551
e81d886539caa4e9b2527984cacd7ec0 2015-01-12 2015-02-23 Derivación 1 1 0 4 5,551
e81d886539caa4e9b2527984cacd7ec0 2013-11-28 2014-03-13 Derivación 305 0 1 4 5,552
e81d886539caa4e9b2527984cacd7ec0 2013-10-02 2013-11-22 Derivación 6 1 0 4 5,552
fba24f5affb5795f58a61bed2019722a 2015-11-02 2016-05-02 Alta Administrativa NA 0 1 4 6,025
fba24f5affb5795f58a61bed2019722a 2015-07-01 2015-11-01 Derivación 1 1 0 4 6,025
fba24f5affb5795f58a61bed2019722a 2014-07-23 2015-07-01 Derivación 0 1 1 4 6,025
fba24f5affb5795f58a61bed2019722a 2013-10-21 2014-01-30 Derivación 67 0 1 4 6,026
fba24f5affb5795f58a61bed2019722a 2013-07-30 2013-10-18 Derivación 3 1 0 4 6,026
fba24f5affb5795f58a61bed2019722a 2013-04-09 2013-07-29 Alta Administrativa 1 0 1 4 6,027
fba24f5affb5795f58a61bed2019722a 2012-07-04 2013-04-09 Derivación 0 1 0 4 6,027
fba24f5affb5795f58a61bed2019722a 2012-05-01 2012-07-02 Derivación 2 1 1 4 6,027
fba24f5affb5795f58a61bed2019722a 2012-02-08 2012-02-24 Abandono Temprano 67 0 1 4 6,028
fba24f5affb5795f58a61bed2019722a 2011-07-14 2012-01-31 Derivación 8 1 0 4 6,028
Note= Cases with an entry that had a referral as a cause of discharge and >45 days (n= 6,809)
  #%>%
  #  kableExtra::scroll_box(width= "100%", height = "30%")


We applied these criteria to all of the entries that shared common records that could be considered as a part of a continuous treatment.


In case of variables such as the primary substance and other substances and educational attainment, we ordered this variables in terms of vulnerability, and replaced variables but giving priority to more vulnerable categories. For variables in which there were not a clear hierarchy to identify the more vulnerable category, we selected the values present in the entries in which the treatments lasted longer than the rest. In case of keeping values of days with the more than one maximum amount of days of treatment up to the date of retrieval, the rank took one of the corresponding rows randomly.


In case of the variable related to the type of plan, we left two variables, specifying the last plan and another specifying the plan of the larger entry.


In case of other substances at admission, we generated five variables (otras_sus1_mod,otras_sus2_mod, otras_sus3_mod, sus_ini_2_mod and sus_ini_3_mod) that selected only the variables related to the main substances.


invisible(c("OJO PUSE EVAL=F"))
invisible(c("4.ejemplos de casos con trat continuos distintos"))
   # dplyr::filter(sum_validadores>1) #ffd3f4ed5841cfac947ce546757b8e3f, es un caso que tiene un par de días que se podrían colapsar: fech egres 2014-11-27 y después ingresa en 2015-03-27 // lo mismo con 015ea90c1b1655155f30a3e276436ed5 en 2009-09-25 a 2012-02-29 y después 2018-07-03 al 2019-08-30
#12,945, posiblemente hay otros pares que se superponen 
invisible(c("5. Hay casos que tienen hasta 7 tratamientos continuos ¿?, raro- lo vi y corresponde"))
#dd0d42261d00273d4e19ff2a46bda4b9_5276 dd0d42261d00273d4e19ff2a46bda4b9, pueden existir hasta 7 trat continuos ¿?
toString2<-
function (x, width = NULL, ...) 
        {
            string <- paste(x, collapse = "; ")
            if (missing(width) || is.null(width) || width == 0) 
                return(string)
            if (width < 0) 
                stop("'width' must be positive")
            if (nchar(string, type = "w") > width) {
                width <- max(6, width)
                string <- paste0(strtrim(string, width - 4), "....")
            }
            string
}

CONS_C1_df_dup_JUN_2020%>%
#FILTRAR VARIABLES
#_#_#_#_#_#_#_
    dplyr::filter(row %in% unlist(c(CONS_C1_JUN_2020_row_sig_row$row,CONS_C1_JUN_2020_row_sig_row$sig_row)))%>%
    dplyr::mutate(ano_bd2=ano_bd)%>%
    dplyr::arrange(hash_key,desc(fech_ing))%>%
    dplyr::mutate(filter_complex_anterior= dplyr::case_when(!is.na(lag(diff_bet_treat)) & lag(diff_bet_treat)<45 & as.character(lag(motivoegreso_derivacion))=="Referral"~1,TRUE~0))%>%
    dplyr::mutate(filter_complex= dplyr::case_when(!is.na(diff_bet_treat) & diff_bet_treat<45 & as.character(motivoegreso_derivacion)=="Referral"~1,TRUE~0))%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(sum_validadores=sum(filter_complex))%>%
    dplyr::mutate(n_complex = str_count(filter_complex, '0'))%>%
    ungroup()%>%
    dplyr::mutate(cumsum_n_complex = cumsum(n_complex))%>%
    dplyr::mutate(concat_hash_id_treatments=paste0(hash_key,"_",cumsum_n_complex))%>%
    dplyr::group_by(concat_hash_id_treatments)%>%
    dplyr::mutate(rn_common_treats=row_number())%>% 
    dplyr::mutate(rn_common_treats2=row_number())%>% 
    ungroup()%>%
    dplyr::mutate(tipo_de_plan_2_for_f=tipo_de_plan_2)%>%
    dplyr::mutate(mod_0_row=row)%>%
    dplyr::mutate(obs_for_e=obs)%>%
#_#_#_#_#_#_#_#_#_#_
#a
#_#_#_#_#_#_#_#_#_#_
    #dplyr::filter(hash_key=="dd0d42261d00273d4e19ff2a46bda4b9")%>% dplyr::select(row,hash_key, concat_hash_id_treatments, fech_ing, fech_egres_imp, diff_bet_treat)%>% View()
##IMPORTANTE: PARA VER TIPOS DISTINTOS
    #dplyr::mutate(n_dist=n_distinct(cumsum_n_complex))%>%
    #dplyr::filter(n_dist>3)%>%
    #dplyr::select(hash_key,fech_ing,fech_egres_imp,motivodeegreso_mod_imp,diff_bet_treat,filter_complex,filter_complex_anterior,n_dist,cumsum_n_complex)%>%
                tidyr::pivot_wider(names_from =  rn_common_treats, 
                                   names_sep="_",
                                   values_from = c(row, tipo_centro, servicio_de_salud, senda,id_centro, tipo_de_plan_2,obs))%>%
  dplyr::group_by(concat_hash_id_treatments)%>%
  dplyr::mutate_at(vars(row_1:obs_7),~max(as.character(.),na.rm=T))%>%
  dplyr::ungroup()%>%
  unite(., col = "mod_a_row",  row_1:row_7, na.rm=TRUE, sep = "; ")%>%
  unite(., col = "mod_a_tipo_centro",  tipo_centro_1:tipo_centro_7, na.rm=TRUE, sep = "; ")%>%
  unite(., col = "mod_a_servicio_de_salud",  servicio_de_salud_1:servicio_de_salud_7, na.rm=TRUE, sep = "; ")%>%
  unite(., col = "mod_a_senda",  senda_1:senda_7, na.rm=TRUE, sep = "; ")%>%
  unite(., col = "mod_a_id_centro",  id_centro_1:id_centro_7, na.rm=TRUE, sep = "; ")%>%
  unite(., col = "mod_a_tipo_de_plan_2",  tipo_de_plan_2_1:tipo_de_plan_2_7, na.rm=TRUE, sep = "; ")%>%
  unite(., col = "mod_a_obs",  obs_1:obs_7, na.rm=TRUE, sep = "; ")%>%
  dplyr::mutate(mod_a_obs=sub("^;", "", mod_a_obs))%>%
  tidyr::separate(mod_a_obs,into=paste0("obs",1:30), sep=";")%>%
  dplyr::mutate(across(c(obs1:obs30),~stringr::str_trim(.)))%>%
  dplyr::mutate(mod_a_obs = pmap_chr(select(.,obs1:obs30), ~toString2(unique(na.omit(c(...))))))%>%
    
  dplyr::mutate(mod_a_obs,sub("^; ; ", "", mod_a_obs))%>%
  dplyr::mutate(mod_a_obs,sub("^; ", "", mod_a_obs))%>%
  dplyr::mutate(mod_a_obs=sub("^;", "", mod_a_obs))%>% 
  dplyr::mutate(mod_a_obs,sub("; ; $", "", mod_a_obs))%>%
  dplyr::mutate(mod_a_obs,sub("; $", "", mod_a_obs))%>%
  dplyr::mutate(mod_a_obs,sub(";$", "", mod_a_obs))%>%
  dplyr::mutate(mod_a_obs,sub("; ; ", "; ", mod_a_obs))%>%
  dplyr::mutate(mod_a_obs,sub("; ;", "; ", mod_a_obs))%>%
  
  dplyr::mutate(mod_b_tipo_de_plan_2=sub(".*\\;","",mod_a_tipo_de_plan_2))%>% #ultimos tratamiento
  dplyr::mutate(mod_b_id_centro=sub(".*\\;","",mod_a_id_centro))%>% #ultimos tratamiento
    #dplyr::mutate(across(c(mod_a_senda),~str_count(., pattern = ";"),.names="{col}_cnt"))%>% dplyr::select(mod_a_senda_cnt)%>% summary()
  #qué hizo el senda con los concatenados, no están
#_#_#_#_#_#_#_#_#_#_
#b Last value
#_#_#_#_#_#_#_#_#_#_
  dplyr::group_by(concat_hash_id_treatments)%>%
  dplyr::mutate(n_concat_hash_id_treatments=n())%>%
  dplyr::mutate(fech_egres_imp=as.character(fech_egres_imp))%>%
  dplyr::mutate(fech_egres_imp=ifelse(n_concat_hash_id_treatments==1 & is.na(fech_egres_imp),"2019-11-13",fech_egres_imp))%>%
  dplyr::mutate(motivodeegreso_mod_imp=ifelse(n_concat_hash_id_treatments==1 & is.na(motivodeegreso_mod_imp),"En curso",as.character(motivodeegreso_mod_imp)))%>%
  dplyr::mutate(across(c(numero_de_hijos_mod, num_hijos_trat_res_mod,identidad_de_genero),~dplyr::first(na.omit(.)),.names = "mod_b_{col}"))%>%
  dplyr::mutate(across(c(tipo_centro_derivacion, motivodeegreso_mod_imp, fech_egres_imp, macrozona, nombre_region, nombre_centro, comuna_residencia_cod,ano_bd),~dplyr::first(.),.names = "mod_b_{col}"))%>%
  
  #dplyr::select(hash_key,concat_hash_id_treatments,  n_concat_hash_id_treatments,numero_de_hijos, num_hijos_ing_trat_res, fech_egres_imp, motivodeegreso_mod_imp, macrozona, nombre_region, comuna_residencia_cod,identidad_de_genero,starts_with("mod_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>%
   #dplyr::filter(hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>%View() #CASOS CON 2 TRATAMIENTOS, EFECTIVAMENTE SON RELLENADOS.
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#c First value
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::ungroup()%>%
  dplyr::mutate(fech_ing=as.character(fech_ing))%>%
  dplyr::group_by(concat_hash_id_treatments)%>%
  dplyr::mutate(across(c(fech_ing, fecha_ingreso_a_convenio_senda, embarazo, edad_al_ing, origen_ingreso_mod, edad_al_ing_grupos,ano_bd2),~dplyr::last(na.omit(.)),.names = "mod_c_{col}"))%>%
  dplyr::ungroup()%>%
  assign("CONS_C1_df_dup_JUN_2020_a_c",., envir = .GlobalEnv)
    #dplyr::select(hash_key,concat_hash_id_treatments,  n_concat_hash_id_treatments,fech_ing, fecha_ingreso_a_convenio_senda, edad_al_ing, origen_ingreso_mod, edad_al_ing_grupos, starts_with("mod_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>%View()

#CONS_C1_df_dup_JUN_2020_a_c%>% janitor::tabyl(mod_a_obs)%>% dplyr::select(mod_a_obs)%>% data.frame()%>% View()
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#d     #Primero ordenar por vulnerabilidad, luego hacer la selección
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  
CONS_C1_df_dup_JUN_2020_a_c%>%
  dplyr::mutate(compromiso_biopsicosocial=dplyr::case_when(compromiso_biopsicosocial=="Leve"~1,compromiso_biopsicosocial=="Moderado"~2,compromiso_biopsicosocial=="Severo"~3,TRUE~NA_real_))%>%
  dplyr::mutate(escolaridad=dplyr::case_when(escolaridad=="Mayor a Ed Secundaria"~1,escolaridad=="Ed Secundaria Completa o Menor"~2,escolaridad=="Ed Primaria Completa o Menor"~3,TRUE~NA_real_))%>%
  dplyr::mutate(across(c(dg_global_nec_int_soc_or, dg_nec_int_soc_cap_hum_or, dg_nec_int_soc_cap_fis_or, dg_nec_int_soc_cap_soc_or,dg_global_nec_int_soc_or_1, dg_nec_int_soc_cap_hum_or_1, dg_nec_int_soc_cap_fis_or_1,dg_nec_int_soc_cap_soc_or_1),~dplyr::case_when(as.character(.)=="Bajas"~3,as.character(.)=="Medias"~2,as.character(.)=="Altas"~1,TRUE~NA_real_)))%>%
  dplyr::mutate(across(c(evaluacindelprocesoteraputico, eva_consumo, eva_fam, eva_relinterp, eva_ocupacion, eva_sm, eva_fisica, eva_transgnorma),~dplyr::case_when(as.character(.)=="Logro Mínimo"~3,as.character(.)=="Logro M?mo"~3,as.character(.)=="Logro Intermedio"~2,as.character(.)=="Logro Alto"~1,TRUE~NA_real_)))%>% 
 #dplyr::select(hash_key,concat_hash_id_treatments, n_concat_hash_id_treatments,compromiso_biopsicosocial,escolaridad,dg_global_nec_int_soc_or, dg_nec_int_soc_cap_hum_or, dg_nec_int_soc_cap_fis_or, dg_nec_int_soc_cap_soc_or,dg_global_nec_int_soc_or_1, dg_nec_int_soc_cap_hum_or_1, dg_nec_int_soc_cap_fis_or_1, dg_nec_int_soc_cap_soc_or_1,evaluacindelprocesoteraputico, eva_consumo, eva_fam, eva_relinterp, eva_ocupacion, eva_sm, eva_fisica, eva_transgnorma,starts_with("mod_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>% 

  dplyr::group_by(concat_hash_id_treatments)%>%
  
      dplyr::mutate(across(c(compromiso_biopsicosocial,escolaridad,dg_global_nec_int_soc_or, dg_nec_int_soc_cap_hum_or, dg_nec_int_soc_cap_fis_or, dg_nec_int_soc_cap_soc_or,dg_global_nec_int_soc_or_1, dg_nec_int_soc_cap_hum_or_1, dg_nec_int_soc_cap_fis_or_1, dg_nec_int_soc_cap_soc_or_1,evaluacindelprocesoteraputico, eva_consumo, eva_fam, eva_relinterp, eva_ocupacion, eva_sm, eva_fisica, eva_transgnorma),~max(.,na.rm=T),.names = "mod_d_{col}"))%>%
    
      dplyr::mutate(tiene_menores_de_edad_a_cargo_n=ifelse(as.character(tiene_menores_de_edad_a_cargo)=="si",1,0),tiene_menores_de_edad_a_cargo_n=sum(tiene_menores_de_edad_a_cargo_n,na.rm=T),mod_d_tiene_menores_de_edad_a_cargo=ifelse(mod_b_numero_de_hijos_mod>0 & tiene_menores_de_edad_a_cargo_n>0,"si","no"))%>%
      
      dplyr::mutate(x_se_trata_mujer_emb_n=ifelse(as.character(x_se_trata_mujer_emb)=="Si",1,0),x_se_trata_mujer_emb_n=sum(x_se_trata_mujer_emb_n,na.rm=T),mod_d_x_se_trata_mujer_emb=ifelse(x_se_trata_mujer_emb_n>0,"Si","No"))%>%
      
      dplyr::mutate(usuario_tribunal_trat_droga_n=ifelse(as.character(usuario_tribunal_trat_droga)=="Si",1,0),usuario_tribunal_trat_droga_n=sum(usuario_tribunal_trat_droga_n,na.rm=T),mod_d_usuario_tribunal_trat_droga=ifelse(usuario_tribunal_trat_droga_n>0,"Si","No"))%>%
      
      dplyr::mutate(discapacidad_n=ifelse(as.character(discapacidad)=="si",1,0),discapacidad_n=sum(discapacidad_n,na.rm=T),mod_d_discapacidad=ifelse(discapacidad_n>0,"si","no"))%>%
      dplyr::mutate(ha_estado_embarazada_egreso_n=ifelse(as.character(ha_estado_embarazada_egreso)=="si",1,0),ha_estado_embarazada_egreso_n=sum(ha_estado_embarazada_egreso_n,na.rm=T),mod_d_ha_estado_embarazada_egreso=ifelse(ha_estado_embarazada_egreso_n>0,"si","no"))%>%
      dplyr::mutate(dg_trs_cons_sus_or_n=ifelse(as.character(dg_trs_cons_sus_or)=="Dependencia",1,0),dg_trs_cons_sus_or_n=sum(dg_trs_cons_sus_or_n,na.rm=T),mod_d_dg_trs_cons_sus_or=ifelse(dg_trs_cons_sus_or_n>0,"Dependencia","Consumo Perjudicial"))%>%
      dplyr::mutate(mod_d_opcion_discapacidad=max(as.character(opcion_discapacidad),na.rm=T))%>%
  dplyr::ungroup()%>%
 # dplyr::select(hash_key,concat_hash_id_treatments, n_concat_hash_id_treatments,compromiso_biopsicosocial,escolaridad,dg_global_nec_int_soc_or, dg_nec_int_soc_cap_hum_or, dg_nec_int_soc_cap_fis_or, dg_nec_int_soc_cap_soc_or,dg_global_nec_int_soc_or_1, dg_nec_int_soc_cap_hum_or_1, dg_nec_int_soc_cap_fis_or_1, dg_nec_int_soc_cap_soc_or_1,evaluacindelprocesoteraputico, eva_consumo, eva_fam, eva_relinterp, eva_ocupacion, eva_sm, eva_fisica, eva_transgnorma,tiene_menores_de_edad_a_cargo,x_se_trata_mujer_emb,usuario_tribunal_trat_droga,discapacidad,ha_estado_embarazada_egreso,starts_with("mod_d_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>% 

assign("CONS_C1_df_dup_JUN_2020_a_d",., envir = .GlobalEnv)

#¿qué pasa con opción discapacidad?
#CONS_C1_df_dup_JUN_2020_a_d%>% janitor::tabyl(mod_d_discapacidad, opcion_discapacidad)
#CONS_C1_df_dup_JUN_2020_a_d%>% dplyr::select(mod_0_row,concat_hash_id_treatments,mod_d_discapacidad, opcion_discapacidad)%>% dplyr::filter(mod_d_discapacidad=="si")%>% View()

#dg_trs_cons_sus_or

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#e     #mantener- agregar hash_key
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      

invisible(c("hash_key","id","hash_rut_completo","nacionalidad","sexo_2","id_mod","obs","fech_nac","edad_ini_cons","edad_ini_sus_prin","sus_ini","estado_conyugal_2","edad_grupos","etnia_cor","nacionalidad_2","etnia_cor_2","sus_ini_2","sus_ini_3","sus_ini_mod","obs_cambios","obs_cambios_ninguno","obs_cambios_num","obs_cambios_fac","at_least_one_cont_entry"))

CONS_C1_df_dup_JUN_2020_a_d%>%
      dplyr::mutate(across(c(sus_ini_2, sus_ini_3),~dplyr::case_when(as.character(.)!="Alcohol"&as.character(.)!="Cocaína"&as.character(.)!="Marihuana"&as.character(.)!="Pasta Base"~"Otros",TRUE~as.character(.)),.names = "{col}_mod"))%>%
  dplyr::mutate(across(c(hash_key,id,hash_rut_completo,nacionalidad,sexo_2,id_mod,obs_for_e,fech_nac,edad_ini_cons,edad_ini_sus_prin,sus_ini,estado_conyugal_2,edad_grupos,etnia_cor,nacionalidad_2,etnia_cor_2,sus_ini_2_mod,sus_ini_3_mod,sus_ini_mod,obs_cambios,obs_cambios_ninguno,obs_cambios_num,obs_cambios_fac,at_least_one_cont_entry), ~ .,.names = "mod_e_{col}"))%>%
  dplyr::rename("mod_e_obs"="mod_e_obs_for_e")%>%
assign("CONS_C1_df_dup_JUN_2020_a_e",., envir = .GlobalEnv)

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#f     #Tratamiento más largo
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      

invisible(c("acá no está clara la jerarquía en términos de vulnerabildiad. Un inactivo puede estar mejor que un desempleado, así como no; un asalariado"))

set.seed(1234) #para la resolución de empates
CONS_C1_df_dup_JUN_2020_a_e%>%
  dplyr::mutate(dias_trat_imp_op= ifelse(is.na(fech_egres_imp),as.integer(lubridate::time_length(difftime(as.Date("2019-11-13"),as.Date(as.character(fech_ing))),"days")),dias_trat_imp))%>%
  dplyr::mutate(across(c(otras_sus1, otras_sus2,otras_sus3),~dplyr::case_when(as.character(.)!="Alcohol"&as.character(.)!="Cocaína"&as.character(.)!="Marihuana"&as.character(.)!="Pasta Base"~"Otros",TRUE~as.character(.)),.names = "{col}_mod"))%>%
  dplyr::mutate(rn_total=row_number())%>%
  
  dplyr::group_by(concat_hash_id_treatments)%>%
#dplyr::mutate(n_days_op_max= max(dias_trat_imp_op,na.rm=T))%>%
  mutate(rank_dias_trat_op  = rank(-dias_trat_imp_op, ties.method = "random"))%>% #negative, descendant
  dplyr::arrange(concat_hash_id_treatments,rank_dias_trat_op)%>%
  dplyr::mutate(across(c(con_quien_vive,tipo_de_plan_2_for_f,tipo_de_programa_2,estatus_ocupacional,cat_ocupacional,origen_ingreso,tipo_de_vivienda_mod,tenencia_de_la_vivienda_mod,rubro_trabaja_mod,sus_principal_mod,freq_cons_sus_prin,via_adm_sus_prin_act,otras_sus1_mod,otras_sus2_mod,otras_sus3_mod,via_adm_sus_prin), ~dplyr::first(na.omit(.)),.names = "mod_f_{col}"))%>%
  dplyr::ungroup()%>%
  #dplyr::rename("mod_f_tipo_de_plan_2_for_f"="mod_f_tipo_de_plan_2")%>%
  #dplyr::select(hash_key,concat_hash_id_treatments, n_concat_hash_id_treatments,con_quien_vive,tipo_de_plan_2_for_f,estatus_ocupacional,cat_ocupacional,origen_ingreso,tipo_de_vivienda_mod,tenencia_de_la_vivienda_mod,rubro_trabaja_mod,otras_sus1_mod,otras_sus2_mod,otras_sus3_mod,rank_dias_trat_op,dias_trat_imp_op,starts_with("mod_f_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>% 
  dplyr::arrange(rn_total)%>%
assign("CONS_C1_df_dup_JUN_2020_a_f",., envir = .GlobalEnv)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#g- Replacedfavored dgs.-a     #Reemplazar si está presente
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

invisible(c("acá no quiero captar progresion, sólo valores distintos"))
library(tidyverse)
toString2<-
function (x, width = NULL, ...) 
        {
            string <- paste(x, collapse = "; ")
            if (missing(width) || is.null(width) || width == 0) 
                return(string)
            if (width < 0) 
                stop("'width' must be positive")
            if (nchar(string, type = "w") > width) {
                width <- max(6, width)
                string <- paste0(strtrim(string, width - 4), "....")
            }
            string
        }

#para definir lo que privilegiaré
  dg_trs_psiq_dsm_iv_or_cat<-CONS_C1_df_dup_JUN_2020 %>% dplyr::mutate(dg_trs_psiq_dsm_iv_or=stringr::str_trim(as.character(dg_trs_psiq_dsm_iv_or)))%>% janitor::tabyl(dg_trs_psiq_dsm_iv_or)%>% data.frame()%>% select(dg_trs_psiq_dsm_iv_or)%>% dplyr::filter(!dg_trs_psiq_dsm_iv_or %in% c("En estudio","Sin trastorno", NA))%>% unlist()%>% as.character()            
  dg_trs_psiq_cie_10_or_cat<-CONS_C1_df_dup_JUN_2020 %>% dplyr::mutate(dg_trs_psiq_cie_10_or=stringr::str_trim(as.character(dg_trs_psiq_cie_10_or)))%>% janitor::tabyl(dg_trs_psiq_cie_10_or)%>% data.frame()%>% select(dg_trs_psiq_cie_10_or)%>% dplyr::filter(!dg_trs_psiq_cie_10_or %in% c("En estudio(NA)","Sin trastorno(NA)", NA))%>% unlist()%>% as.character() 
  dg_trs_psiq_sub_dsm_iv_or_cat<-CONS_C1_df_dup_JUN_2020 %>% dplyr::mutate(dg_trs_psiq_sub_dsm_iv_or=stringr::str_trim(as.character(dg_trs_psiq_sub_dsm_iv_or)))%>% janitor::tabyl(dg_trs_psiq_sub_dsm_iv_or)%>% data.frame()%>% select(dg_trs_psiq_sub_dsm_iv_or)%>% unlist()%>% as.character() 
  dg_trs_psiq_sub_cie_10_or_cat<-CONS_C1_df_dup_JUN_2020 %>% dplyr::mutate(dg_trs_psiq_sub_cie_10_or=stringr::str_trim(as.character(dg_trs_psiq_sub_cie_10_or)))%>% janitor::tabyl(dg_trs_psiq_sub_cie_10_or)%>% data.frame()%>% select(dg_trs_psiq_sub_cie_10_or)%>% dplyr::filter(!dg_trs_psiq_sub_cie_10_or %in% c(NA))%>% unlist()%>% as.character() 
  diagnostico_trs_fisico_cat<-CONS_C1_df_dup_JUN_2020 %>% dplyr::mutate(diagnostico_trs_fisico=stringr::str_trim(as.character(diagnostico_trs_fisico)))%>% janitor::tabyl(diagnostico_trs_fisico)%>% data.frame()%>% select(diagnostico_trs_fisico)%>% dplyr::filter(!diagnostico_trs_fisico %in% c("En estudio","Sin trastorno", NA))%>% unlist()%>% as.character() 
  otros_probl_at_sm_or_cat<-CONS_C1_df_dup_JUN_2020 %>% dplyr::mutate(otros_probl_at_sm_or=stringr::str_trim(as.character(otros_probl_at_sm_or)))%>% janitor::tabyl(otros_probl_at_sm_or)%>% data.frame()%>% select(otros_probl_at_sm_or)%>% dplyr::filter(!otros_probl_at_sm_or %in% c("Sin otros problemas de salud mental", NA))%>% unlist()%>% as.character() 

CONS_C1_df_dup_JUN_2020_a_f%>%
  dplyr::mutate(rn_common_treats=rn_common_treats2)%>%
  #dplyr::mutate(across(c(dg_trs_psiq_dsm_iv_or,x2_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_cie_10_or,diagnostico_trs_fisico), ~replace_na(as.character(.), 0),.names = "{col}_mod"))%>%
    #dplyr::mutate(across(c(dg_trs_psiq_dsm_iv_or_mod,x2_dg_trs_psiq_dsm_iv_or_mod,x3_dg_trs_psiq_dsm_iv_or_mod,dg_trs_psiq_cie_10_or_mod,x2_dg_trs_psiq_cie_10_or_mod,x3_dg_trs_psiq_cie_10_or_mod,diagnostico_trs_fisico_mod), ~dplyr::case_when(grepl("En estudio",as.character(.))~1,grepl("Sin trastorno",as.character(.))~0,TRUE~2)))%>%
  dplyr::mutate(across(c(dg_trs_psiq_dsm_iv_or,dg_trs_psiq_sub_dsm_iv_or,x2_dg_trs_psiq_dsm_iv_or,x2_dg_trs_psiq_sub_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_sub_dsm_iv_or,dg_trs_psiq_cie_10_or,dg_trs_psiq_sub_cie_10_or,x2_dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_sub_cie_10_or,x3_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_sub_cie_10_or,diagnostico_trs_fisico,otros_probl_at_sm_or),~stringr::str_trim(.)))%>%
                tidyr::pivot_wider(names_from =  rn_common_treats, 
                                   names_sep="_",
                                   values_from =
c(dg_trs_psiq_dsm_iv_or,dg_trs_psiq_sub_dsm_iv_or,x2_dg_trs_psiq_dsm_iv_or,x2_dg_trs_psiq_sub_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_sub_dsm_iv_or,dg_trs_psiq_cie_10_or,dg_trs_psiq_sub_cie_10_or, x2_dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_sub_cie_10_or,x3_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_sub_cie_10_or,diagnostico_trs_fisico,otros_probl_at_sm_or))%>%
  #c(dg_trs_psiq_dsm_iv_or_mod,x2_dg_trs_psiq_dsm_iv_or_mod,x3_dg_trs_psiq_dsm_iv_or_mod,dg_trs_psiq_cie_10_or_mod,x2_dg_trs_psiq_cie_10_or_mod,x3_dg_trs_psiq_cie_10_or_mod,diagnostico_trs_fisico_mod))%>%
    dplyr::group_by(concat_hash_id_treatments)%>%
    dplyr::mutate_at(vars(dg_trs_psiq_dsm_iv_or_1:otros_probl_at_sm_or_7),~max(as.character(.),na.rm=T))%>%
    #dplyr::mutate_at(vars(dg_trs_psiq_dsm_iv_or_mod_1:diagnostico_trs_fisico_mod_7),~max(as.numeric(.),na.rm=T))%>%
  dplyr::ungroup()%>%
    dplyr::mutate(mod_g_dg_trs_psiq_dsm_iv_or = pmap_chr(select(.,dg_trs_psiq_dsm_iv_or_1:dg_trs_psiq_dsm_iv_or_7,x2_dg_trs_psiq_dsm_iv_or_1:x2_dg_trs_psiq_dsm_iv_or_7,x3_dg_trs_psiq_dsm_iv_or_1:x3_dg_trs_psiq_dsm_iv_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    dplyr::mutate(mod_g_dg_trs_psiq_sub_dsm_iv_or = pmap_chr(select(.,dg_trs_psiq_sub_dsm_iv_or_1:dg_trs_psiq_sub_dsm_iv_or_7,x2_dg_trs_psiq_sub_dsm_iv_or_1:x2_dg_trs_psiq_sub_dsm_iv_or_7,x3_dg_trs_psiq_sub_dsm_iv_or_1:x3_dg_trs_psiq_sub_dsm_iv_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    #dplyr::mutate(mod_g_x2_dg_trs_psiq_dsm_iv_or = pmap_chr(select(.,x2_dg_trs_psiq_dsm_iv_or_1:x2_dg_trs_psiq_dsm_iv_or_7), ~toString2(unique(na.omit(c(...))))))%>%
   # dplyr::mutate(mod_g_x2_dg_trs_psiq_sub_dsm_iv_or = pmap_chr(select(.,x2_dg_trs_psiq_sub_dsm_iv_or_1:x2_dg_trs_psiq_sub_dsm_iv_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    #dplyr::mutate(mod_g_x3_dg_trs_psiq_dsm_iv_or = pmap_chr(select(.,x3_dg_trs_psiq_dsm_iv_or_1:x3_dg_trs_psiq_dsm_iv_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    #dplyr::mutate(mod_g_x3_dg_trs_psiq_sub_dsm_iv_or = pmap_chr(select(.,x3_dg_trs_psiq_sub_dsm_iv_or_1:x3_dg_trs_psiq_sub_dsm_iv_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    dplyr::mutate(mod_g_dg_trs_psiq_cie_10_or = pmap_chr(select(.,dg_trs_psiq_cie_10_or_1:dg_trs_psiq_cie_10_or_7,x2_dg_trs_psiq_cie_10_or_1:x2_dg_trs_psiq_cie_10_or_7,x3_dg_trs_psiq_cie_10_or_1:x3_dg_trs_psiq_cie_10_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    dplyr::mutate(mod_g_dg_trs_psiq_sub_cie_10_or = pmap_chr(select(.,dg_trs_psiq_sub_cie_10_or_1:dg_trs_psiq_sub_cie_10_or_7,x2_dg_trs_psiq_sub_cie_10_or_1:x2_dg_trs_psiq_sub_cie_10_or_7,x3_dg_trs_psiq_sub_cie_10_or_1:x3_dg_trs_psiq_sub_cie_10_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    #dplyr::mutate(mod_g_x2_dg_trs_psiq_cie_10_or = pmap_chr(select(.,x2_dg_trs_psiq_cie_10_or_1:x2_dg_trs_psiq_cie_10_or_7), ~toString2(unique(na.omit(c(...))))))%>%
  #  dplyr::mutate(mod_g_x2_dg_trs_psiq_sub_cie_10_or = pmap_chr(select(.,x2_dg_trs_psiq_sub_cie_10_or_1:x2_dg_trs_psiq_sub_cie_10_or_7), ~toString2(unique(na.omit(c(...))))))%>%
   # dplyr::mutate(mod_g_x3_dg_trs_psiq_cie_10_or = pmap_chr(select(.,x3_dg_trs_psiq_cie_10_or_1:x3_dg_trs_psiq_cie_10_or_7), ~toString2(unique(na.omit(c(...))))))%>%
   # dplyr::mutate(mod_g_x3_dg_trs_psiq_sub_cie_10_or = pmap_chr(select(.,x3_dg_trs_psiq_sub_cie_10_or_1:x3_dg_trs_psiq_sub_cie_10_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    dplyr::mutate(mod_g_diagnostico_trs_fisico = pmap_chr(select(.,diagnostico_trs_fisico_1:diagnostico_trs_fisico_7), ~toString2(unique(na.omit(c(...))))))%>%
    dplyr::mutate(mod_g_otros_probl_at_sm_or = pmap_chr(select(.,otros_probl_at_sm_or_1:otros_probl_at_sm_or_7), ~toString2(unique(na.omit(c(...))))))%>%
    #plyr::ungroup()%>%
    #dplyr::mutate_all(vars(dg_trs_psiq_dsm_iv_or_mod_1:diagnostico_trs_fisico_mod_7)) %>    unite(., col = "mod_g_x2_dg_trs_psiq_dsm_iv_or_mod", x2_dg_trs_psiq_dsm_iv_or_mod_1:x2_dg_trs_psiq_dsm_iv_or_mod_7, na.rm=TRUE, sep = "; ")%>%
    #unite(., col = "mod_g_dg_trs_psiq_dsm_iv_or_mod", dg_trs_psiq_dsm_iv_or_mod_1:dg_trs_psiq_dsm_iv_or_mod_7, na.rm=TRUE, sep = "; ")%>%
    #unite(., col = "mod_g_x2_dg_trs_psiq_dsm_iv_or_mod", x2_dg_trs_psiq_dsm_iv_or_mod_1:x2_dg_trs_psiq_dsm_iv_or_mod_7, na.rm=TRUE, sep = "; ")%>%
    #unite(., col = "mod_g_x3_dg_trs_psiq_dsm_iv_or_mod", x3_dg_trs_psiq_dsm_iv_or_mod_1:x3_dg_trs_psiq_dsm_iv_or_mod_7, na.rm=TRUE, sep = "; ")%>%
    #unite(., col = "mod_g_dg_trs_psiq_cie_10_or_mod", dg_trs_psiq_cie_10_or_mod_1:dg_trs_psiq_cie_10_or_mod_7, na.rm=TRUE, sep = "; ")%>%
    #unite(., col = "mod_g_x2_dg_trs_psiq_cie_10_or_mod", x2_dg_trs_psiq_cie_10_or_mod_1:x2_dg_trs_psiq_cie_10_or_mod_7, na.rm=TRUE, sep = "; ")%>%
    #unite(., col = "mod_g_x3_dg_trs_psiq_cie_10_or_mod", x3_dg_trs_psiq_cie_10_or_mod_1:x3_dg_trs_psiq_cie_10_or_mod_7, na.rm=TRUE, sep = "; ")%>%
    #unite(., col = "mod_g_diagnostico_trs_fisico_mod", diagnostico_trs_fisico_mod_1:diagnostico_trs_fisico_mod_7, na.rm=TRUE, sep = "; ")%>%

  # dplyr::select(hash_key,concat_hash_id_treatments,n_concat_hash_id_treatments,dg_trs_psiq_dsm_iv_or,x2_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_cie_10_or,diagnostico_trs_fisico,starts_with("mod_g_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>% 
dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or),~dplyr::case_when(grepl(paste(dg_trs_psiq_dsm_iv_or_cat, collapse = "|"),.)~sub("En estudio|Sin trastorno",replacement= "",.),grepl("En estudio",.,ignore.case = T)~str_replace_all(., "Sin trastorno", ""),TRUE~.)))%>%
  
  #nchar>18 nchar("Trastorno Adaptativo")==20  nchar("En estudio")
dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~dplyr::case_when(grepl(paste(dg_trs_psiq_cie_10_or_cat, collapse = "|"),.)~sub("En estudio\\(NA\\)|Sin trastorno\\(NA\\)",replacement= "",.),grepl("En estudio(NA)",.,ignore.case = T)~sub("Sin trastorno\\(NA\\)",replacement= "",.),TRUE~.)))%>%

dplyr::mutate(across(c(mod_g_diagnostico_trs_fisico),~dplyr::case_when(grepl(paste(diagnostico_trs_fisico_cat, collapse = "|"),.)~str_replace_all(., "En estudio|Sin trastorno", ""),grepl("En estudio",.,ignore.case = T)~str_replace_all(., "Sin trastorno", ""),TRUE~.)))%>%  
  dplyr::mutate(across(c(mod_g_otros_probl_at_sm_or),~dplyr::case_when(grepl(paste(otros_probl_at_sm_or_cat, collapse = "|"),.)~str_replace_all(., "Sin otros problemas de salud mental", ""),TRUE~.)))%>%  

    dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~sub("Sin trastorno\\(NA\\); En estudio\\(NA\\)",replacement= "En estudio(NA)", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~sub("En estudio\\(NA\\); Sin trastorno\\(NA\\)",replacement= "En estudio(NA)", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~sub("^Sin trastorno\\(NA\\); ",replacement= "", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~sub("; Sin trastorno\\(NA\\)$",replacement="", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~sub("; Sin trastorno\\(NA\\);",replacement=";", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~sub("Trastornos de los hábitos y del control de los impulsos;",replacement="Trastornos de los hábitos y del control de los impulsos(F63);", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_cie_10_or),~sub("Trastornos de los hábitos y del control de los impulsos$",replacement="Trastornos de los hábitos y del control de los impulsos(F63)", .)))%>%

  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or),~sub("En estudio; Sin trastorno",replacement="En estudio", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or),~sub("Sin trastorno; En estudio",replacement="En estudio", ., perl=TRUE)))%>%
  
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or),~sub("En estudio; Sin trastorno",replacement="En estudio", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or),~sub("Sin trastorno; En estudio",replacement="En estudio", ., perl=TRUE)))%>%
  
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or),~sub("; Sin trastorno;",replacement=";", ., perl=TRUE)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or,mod_g_dg_trs_psiq_sub_dsm_iv_or,mod_g_dg_trs_psiq_cie_10_or,mod_g_dg_trs_psiq_sub_cie_10_or,mod_g_diagnostico_trs_fisico,mod_g_otros_probl_at_sm_or),~sub("^; ; ", "", .)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or,mod_g_dg_trs_psiq_sub_dsm_iv_or,mod_g_dg_trs_psiq_cie_10_or,mod_g_dg_trs_psiq_sub_cie_10_or,mod_g_diagnostico_trs_fisico,mod_g_otros_probl_at_sm_or),~sub("^; ", "", .)))%>%
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or,mod_g_dg_trs_psiq_sub_dsm_iv_or,mod_g_dg_trs_psiq_cie_10_or,mod_g_dg_trs_psiq_sub_cie_10_or,mod_g_diagnostico_trs_fisico,mod_g_otros_probl_at_sm_or),~sub("; ; $", "", .)))%>% 
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or,mod_g_dg_trs_psiq_sub_dsm_iv_or,mod_g_dg_trs_psiq_cie_10_or,mod_g_dg_trs_psiq_sub_cie_10_or,mod_g_diagnostico_trs_fisico,mod_g_otros_probl_at_sm_or),~sub("; $", "", .)))%>% 
  dplyr::mutate(across(c(mod_g_dg_trs_psiq_dsm_iv_or,mod_g_dg_trs_psiq_sub_dsm_iv_or,mod_g_dg_trs_psiq_cie_10_or,mod_g_dg_trs_psiq_sub_cie_10_or,mod_g_diagnostico_trs_fisico,mod_g_otros_probl_at_sm_or),~sub("; ; ", "; ", .)))%>% 
   
   #dplyr::select(hash_key,concat_hash_id_treatments,n_concat_hash_id_treatments,starts_with("mod_g_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>%
  dplyr::ungroup()%>%
  #dplyr::mutate(across(c(rn_common_treats2,mod_g_dg_trs_psiq_dsm_iv_or,mod_g_dg_trs_psiq_sub_dsm_iv_or,mod_g_dg_trs_psiq_cie_10_or,mod_g_dg_trs_psiq_sub_cie_10_or,mod_g_diagnostico_trs_fisico,mod_g_otros_probl_at_sm_or),~str_count(., pattern = ";"),.names="{col}_cnt"))%>%     dplyr::select(rn_common_treats2,mod_g_dg_trs_psiq_dsm_iv_or_cnt,mod_g_dg_trs_psiq_sub_dsm_iv_or_cnt,mod_g_dg_trs_psiq_cie_10_or_cnt,mod_g_dg_trs_psiq_sub_cie_10_or_cnt,mod_g_diagnostico_trs_fisico_cnt,mod_g_otros_probl_at_sm_or_cnt)%>%summary()
  #dplyr::select(rn_common_treats2,mod_g_dg_trs_psiq_dsm_iv_or,mod_g_dg_trs_psiq_sub_dsm_iv_or,mod_g_dg_trs_psiq_cie_10_or,mod_g_dg_trs_psiq_sub_cie_10_or,mod_g_diagnostico_trs_fisico,mod_g_otros_probl_at_sm_or)%>%
assign("CONS_C1_df_dup_JUN_2020_a_g",., envir = .GlobalEnv)

#CONS_C1_df_dup_JUN_2020_a_g%>% janitor::tabyl(mod_g_dg_trs_psiq_cie_10_or)%>% copiar_nombres()

invisible(c("aquí se ven los resumenes de cuantos diagnosticos distintos puede tener un usuario",
            "rn           dsm4         dsm4_sub            cie10                cie10_sub           tr_fis                  otros_sm",
" Min.   :1.000     Min.   :0.0000     Min.   :0.00000     Min.   :0.0000      Min.   :0.0000      Min.   :0.00000       Min.   :0.0000
  1st Qu.:1.000     1st Qu.:0.0000     1st Qu.:0.00000     1st Qu.:0.0000      1st Qu.:0.0000      1st Qu.:0.00000       1st Qu.:0.0000                 
  Median :2.000     Median :0.0000     Median :0.00000     Median :1.0000      Median :0.0000      Median :0.00000       Median :0.0000                 
  Mean   :1.585     Mean   :0.0917     Mean   :0.04751     Mean   :0.8168      Mean   :0.0703      Mean   :0.02704       Mean   :0.1393                 
  3rd Qu.:2.000     3rd Qu.:0.0000     3rd Qu.:0.00000     3rd Qu.:1.0000      3rd Qu.:0.0000      3rd Qu.:0.00000       3rd Qu.:0.0000                 
  Max.   :7.000     Max.   :3.0000     Max.   :3.00000     Max.   :5.0000      Max.   :3.0000      Max.   :2.00000       Max.   :2.0000"))

CONS_C1_df_dup_JUN_2020_a_g%>%
  tidyr::separate(mod_g_dg_trs_psiq_dsm_iv_or,c("mod_g_dg_trs_psiq_dsm_iv_or","mod_g_x2_dg_trs_psiq_dsm_iv_or","mod_g_x3_dg_trs_psiq_dsm_iv_or","mod_g_x4_dg_trs_psiq_dsm_iv_or"), sep="; ")%>%
  tidyr::separate(mod_g_dg_trs_psiq_sub_dsm_iv_or,c("mod_g_dg_trs_psiq_sub_dsm_iv_or","mod_g_x2_dg_trs_psiq_sub_dsm_iv_or","mod_g_x3_dg_trs_psiq_sub_dsm_iv_or","mod_g_x4_dg_trs_psiq_sub_dsm_iv_or"), sep="; ")%>%
  tidyr::separate(mod_g_dg_trs_psiq_cie_10_or,c("mod_g_dg_trs_psiq_cie_10_or","mod_g_x2_dg_trs_psiq_cie_10_or","mod_g_x3_dg_trs_psiq_cie_10_or","mod_g_x4_dg_trs_psiq_cie_10_or","mod_g_x5_dg_trs_psiq_cie_10_or","mod_g_x6_dg_trs_psiq_cie_10_or"), sep="; ")%>%
  tidyr::separate(mod_g_dg_trs_psiq_sub_cie_10_or,c("mod_g_dg_trs_psiq_sub_cie_10_or","mod_g_x2_dg_trs_psiq_sub_cie_10_or","mod_g_x3_dg_trs_psiq_sub_cie_10_or","mod_g_x4_dg_trs_psiq_sub_cie_10_or"), sep="; ")%>%
#dplyr::select(c("mod_g_dg_trs_psiq_cie_10_or","mod_g_x2_dg_trs_psiq_cie_10_or","mod_g_x3_trs_psiq_cie_10_or","mod_g_x4_trs_psiq_cie_10_or","mod_g_x5_trs_psiq_cie_10_or","mod_g_x6_trs_psiq_cie_10_or"))%>% dplyr::filter(!is.na(mod_g_x6_trs_psiq_cie_10_or))%>% 
  assign("CONS_C1_df_dup_JUN_2020_a_g",., envir = .GlobalEnv)

#CONS_C1_df_dup_JUN_2020_a_g%>% janitor::tabyl(mod_g_diagnostico_trs_fisico)
#CONS_C1_df_dup_JUN_2020_a_g%>% janitor::tabyl(otros_probl_at_sm_or)  

  #dg_trs_psiq_sub_dsm_iv_or_cat dg_trs_psiq_sub_cie_10_or_cat
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#h     #Sum values
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

CONS_C1_df_dup_JUN_2020_a_g%>%
  dplyr::group_by(concat_hash_id_treatments)%>%
  dplyr::mutate(mod_h_dias_trat_imp_op=sum(dias_trat_imp_op,na.rm=T))%>%
  dplyr::mutate(mod_h_dias_trat_imp=sum(dias_trat_imp,na.rm=T))%>%
  dplyr::mutate(mod_h_dias_trat_inv=sum(dias_trat_inv,na.rm=T))%>%
  #dplyr::select(dias_trat_imp_op,dias_trat_imp,dias_trat_inv,mod_h_dias_trat_imp_op,mod_h_dias_trat_imp,mod_h_dias_trat_inv)%>% summary()
  slice(1)%>%
  dplyr::ungroup()%>%
  #dplyr::select(hash_key,concat_hash_id_treatments,n_concat_hash_id_treatments,dias_trat_imp_op,dias_trat_imp,dias_trat_inv,starts_with("mod_h_"))%>%dplyr::filter(concat_hash_id_treatments=="dd0d42261d00273d4e19ff2a46bda4b9_5276"|hash_key=="015ea90c1b1655155f30a3e276436ed5"|hash_key=="ffd3f4ed5841cfac947ce546757b8e3f")%>%
  dplyr::select(concat_hash_id_treatments,rn_common_treats2,matches("^mod_[0abcdefgh]_"))%>%
  dplyr::rename("tipo_de_plan_2_concat_a"="mod_a_tipo_de_plan_2")%>%
  dplyr::rename("id_centro_concat_a"="mod_a_id_centro")%>%
  dplyr::rename("obs_concat_a"="mod_a_obs")%>%

  assign("CONS_C1_df_dup_JUN_2020_a_h",., envir = .GlobalEnv)
invisible(c("ANTES QUE TODO. Ver qué variables son susceptiblesde ser combinadas (ej., las a o b que dejan sólo un valor)"))

CONS_C1_df_dup_JUN_2020_a_h%>%
  dplyr::rename_at(.vars = vars(matches("^mod_[abcdefgh]_")),
            .funs = funs(sub("^mod_[abcdefgh]_", "", .)))%>%
  dplyr::mutate(tipo_de_plan_2=stringr::str_trim(tipo_de_plan_2))%>%
  dplyr::mutate(id_centro=stringr::str_trim(id_centro))%>%
  dplyr::select(mod_0_row,concat_hash_id_treatments,id_centro_concat_a, obs_concat_a,tipo_de_plan_2_concat_a,everything())%>%
  assign("CONS_C1_df_dup_JUN_2020_cont_treats",., envir = .GlobalEnv)
 #id_centro tipo_de_plan_2
#  dplyr::mutate(motivodeegreso_mod_imp_tidy= case_when(!is.na(diff_bet_treat) & as.character(motivodeegreso_mod_imp)=="Derivación" & grepl("Clínica",tipo_centro_derivacion==<90~"Abandono Temprano" , TRUE~as.character(motivodeegreso_mod_imp))%>% nrow()
invisible(c("ojo con tipo_de_plan_2_for_f y mod_e_obs_for_e"))
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#DESCARTAR CASOS REDUNDANTES
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
row_id_redundantes_trat_completos<-
      CONS_C1_df_dup_JUN_2020_a_g%>%
          dplyr::filter(!mod_0_row %in% unlist(CONS_C1_df_dup_JUN_2020_a_h$mod_0_row))%>%
          dplyr::select(mod_0_row)%>%unlist()%>%as.numeric()
#CONS_C1_df_dup_JUN_2020_a_h%>%select(matches("^mod_[b]_"))%>% names()%>% data.frame()%>% copiar_nombres()
#CONS_C1_df_dup_JUN_2020_a_h%>%select(matches("^mod_[0abcdefgh]_"))%>% names()%>% data.frame()%>% copiar_nombres()
#sus_principal_mod  


We started wtih 12,945 entries of 5,767 users that it would be appropriate to distinguish 6,136 groups of entries comprising single treatments.


Join with the main dataset


We had to join the new entries of treatments into the original database (n= 117,212; users= 85,603), into a new one that collapsed continuous entries into single treatments (n= 110,403;users= 85,603).


#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#
#CONS_C1_df_dup_JUN_2020_a_h
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#
invisible(c("https://stackoverflow.com/questions/28298688/how-do-i-sweep-specific-columns-with-dplyr"))
invisible(c("https://stackoverflow.com/questions/54818931/difference-between-and-eval-tidy-in-mutate-at"))
invisible(c("https://stackoverflow.com/questions/63290366/mutate-across-multiple-variables-usinga-list-of-third-variables-in-r/63292975#63292975"))
invisible(c("https://stackoverflow.com/questions/51051810/how-do-i-use-mutate-at-with-multiple-functions-where-each-function-has-parameter"))

#CONS_C1_df_dup_MAY_2020_prev_6c%>% dplyr::filter(grepl('3.03.', obs)|grepl('3.03.', obs)) %>% nrow()

#llego a 110,403 entradas

CONS_C1_df_dup_JUN_2020%>%
    dplyr::mutate(across(c(otras_sus1, otras_sus2,otras_sus3),~dplyr::case_when(as.character(.)!="Alcohol"&as.character(.)!="Cocaína"&as.character(.)!="Marihuana"&as.character(.)!="Pasta Base"~"Otros",TRUE~as.character(.)),.names = "{col}_mod"))%>%
   dplyr::mutate(across(c(dg_trs_psiq_dsm_iv_or,dg_trs_psiq_sub_dsm_iv_or,x2_dg_trs_psiq_dsm_iv_or,x2_dg_trs_psiq_sub_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_sub_dsm_iv_or,dg_trs_psiq_cie_10_or,dg_trs_psiq_sub_cie_10_or,x2_dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_sub_cie_10_or,x3_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_sub_cie_10_or,diagnostico_trs_fisico,otros_probl_at_sm_or),~stringr::str_trim(.)))%>%
    dplyr::mutate(fech_ing=as.character(fech_ing))%>%
  dplyr::mutate(fech_egres_imp=as.character(fech_egres_imp))%>%
  dplyr::mutate(ano_bd2=ano_bd)%>%
  dplyr::mutate(across(c(sus_ini_2, sus_ini_3),~dplyr::case_when(as.character(.)!="Alcohol"&as.character(.)!="Cocaína"&as.character(.)!="Marihuana"&as.character(.)!="Pasta Base"~"Otros",TRUE~as.character(.)),.names = "{col}_mod"))%>%

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_          
  #ELIMINAR COLUMNAS INNECESARIAS
#_#_#_#_#_#_#_#_#_#_#_#_#_  
  dplyr::select(-table, -region_del_centro, -tipo_de_programa, -tipo_de_plan, -dias_trat, -nmesesentratamiento, -dias_en_senda, -n_meses_en_senda, -sexo, -edad, -nombre_usuario, -comuna_residencia, -origen_de_ingreso, -pais_nacimiento, -etnia, -estado_conyugal, -parentesco_con_el_jefe_de_hogar, -num_trat_ant, -fecha_ultimo_tratamiento, -sustancia_de_inicio, -edad_inicio_consumo, -escolaridad_ultimo_ano_cursado, -condicion_ocupacional, -categoria_ocupacional, -rubro_trabaja, -tipo_de_vivienda, -tenencia_de_la_vivienda, -sustancia_principal, -`otras_sustancias_nº1`, -`otras_sustancias_nº2`, -`otras_sustancias_nº3`, -freq_cons_sus_prin_original, -edad_inicio_sustancia_principal, -via_adm_sus_prin_original, -sus_principal, -consentimiento_informado, -fech_egres, -motivodeegreso, -mot_egres_alt_adm_or, -consorcio, -fech_egres_sin_fmt, -ano_nac, -fech_ing_ano, -fech_ing_mes, -fech_ing_dia, -concat, -dias_trat_alta_temprana, -motivodeegreso_mod, -dias_trat_knn_imp, -fech_egres_knn_imp, -dias_trat_alta_temprana_knn_imp, -motivodeegreso_imp, -dias_trat_alta_temprana_imp, -concat_hash_sus_prin, -dg_trs_psiq_cie_10_egres_or, -menor_45_dias_diff, -menor_60_dias_diff)%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#HACER EL JOIN
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
    dplyr::filter(!row %in%row_id_redundantes_trat_completos)%>%
    dplyr::left_join(CONS_C1_df_dup_JUN_2020_cont_treats, by=c("row"="mod_0_row"), suffix=c("","_cont_entries"))%>%
    assign("CONS_C1_df_dup_JUL_2020_prev0",., envir = .GlobalEnv)

  #names() %>% data.frame()%>% copiar_nombres()
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#VER VARIABLES CODIFICADAS EN MÁS DE UNA FORMA
#_#_#_#_#_#_#_ 
#####id_centro_concat_a joined  1. concatenado
#####obs_concat_a   joined  1. concatenado
#####tipo_de_plan_2_concat_a    joined  1. concatenado
#####tipo_de_plan_2_cont_entries    joined  2. ultimo tratamiento (b)
#####id_centro_cont_entries joined  2. ultimo tratamiento (b)
#####obs_cont_entries   joined  2. mismo valor (e )
#####tipo_de_plan_2_for_f   joined  3. aparentemente el correspondiente al tratmaiento más largo (f)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
out_cols <- c("id_centro","tipo_de_plan_2","obs")
out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev0%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev0",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev0%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
      dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>%
      View()
}
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#a. Wide format
#_#_#_#_#_#_#_
#dplyr::mutate(mod_b_id_centro=sub(".*\\;","",mod_a_id_centro))%>% #ultimos tratamiento

out_cols <- c("tipo_centro_cont_entries", "servicio_de_salud_cont_entries", "senda_cont_entries")
#  c(paste0(gsub("_cont_entries", "", out_cols),"_final"),gsub("_cont_entries", "", out_cols))
out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")

CONS_C1_df_dup_JUL_2020_prev0%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_a",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_a%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_a",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_a%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
      dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>% 
      View()
}
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#b. Maximum/ Last Value
#_#_#_#_#_#_#_
out_cols <- c("nombre_centro_cont_entries", "numero_de_hijos_mod_cont_entries","ano_bd_cont_entries", "num_hijos_trat_res_mod","tipo_centro_derivacion_cont_entries","fech_egres_imp_cont_entries","motivodeegreso_mod_imp_cont_entries","macrozona_cont_entries","nombre_region_cont_entries","comuna_residencia_cod_cont_entries")
#  c(paste0(gsub("_cont_entries", "", out_cols),"_final"),gsub("_cont_entries", "", out_cols))
out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")
  
CONS_C1_df_dup_JUL_2020_prev_a%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_b",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_b%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_b",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_b%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
      dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>% 
      View()
}
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#c. Minimum/ First value
#_#_#_#_#_#_#_
out_cols <- c("fech_ing_cont_entries","fecha_ingreso_a_convenio_senda_cont_entries","identidad_de_genero_cont_entries","edad_al_ing_cont_entries","origen_ingreso_mod_cont_entries","embarazo_cont_entries", "ano_bd2_cont_entries")
#  c(paste0(gsub("_cont_entries", "", out_cols),"_final"),gsub("_cont_entries", "", out_cols))
out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")
  
CONS_C1_df_dup_JUL_2020_prev_b%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_c",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_c%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_c",., envir = .GlobalEnv)
}

no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_c%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
     dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>% 
      View()
}
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#d. Kept more vulnerable category
#_#_#_#_#_#_#_
out_cols <- c("x_se_trata_mujer_emb_cont_entries","compromiso_biopsicosocial_cont_entries","dg_global_nec_int_soc_or_cont_entries","dg_nec_int_soc_cap_hum_or_cont_entries","dg_nec_int_soc_cap_fis_or_cont_entries","dg_nec_int_soc_cap_soc_or_cont_entries","usuario_tribunal_trat_droga_cont_entries","evaluacindelprocesoteraputico_cont_entries","eva_consumo_cont_entries","eva_fam_cont_entries","eva_relinterp_cont_entries","eva_ocupacion_cont_entries","eva_sm_cont_entries","eva_fisica_cont_entries","eva_transgnorma_cont_entries","dg_global_nec_int_soc_or_1_cont_entries","dg_nec_int_soc_cap_hum_or_1_cont_entries","dg_nec_int_soc_cap_fis_or_1_cont_entries","dg_nec_int_soc_cap_soc_or_1_cont_entries","tiene_menores_de_edad_a_cargo_cont_entries","ha_estado_embarazada_egreso_cont_entries","discapacidad_cont_entries","opcion_discapacidad_cont_entries","escolaridad_cont_entries","edad_al_ing_grupos_cont_entries")
#  c(paste0(gsub("_cont_entries", "", out_cols),"_final"),gsub("_cont_entries", "", out_cols))
out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")
  
CONS_C1_df_dup_JUL_2020_prev_c%>%
        dplyr::mutate(compromiso_biopsicosocial=dplyr::case_when(compromiso_biopsicosocial=="Leve"~1,compromiso_biopsicosocial=="Moderado"~2,compromiso_biopsicosocial=="Severo"~3,TRUE~NA_real_))%>%
    dplyr::mutate(escolaridad=dplyr::case_when(escolaridad=="Mayor a Ed Secundaria"~1,escolaridad=="Ed Secundaria Completa o Menor"~2,escolaridad=="Ed Primaria Completa o Menor"~3,TRUE~NA_real_))%>%
    dplyr::mutate(across(c(dg_global_nec_int_soc_or, dg_nec_int_soc_cap_hum_or, dg_nec_int_soc_cap_fis_or, dg_nec_int_soc_cap_soc_or,dg_global_nec_int_soc_or_1, dg_nec_int_soc_cap_hum_or_1, dg_nec_int_soc_cap_fis_or_1,dg_nec_int_soc_cap_soc_or_1),~dplyr::case_when(as.character(.)=="Bajas"~3,as.character(.)=="Medias"~2,as.character(.)=="Altas"~1,TRUE~NA_real_)))%>%
    dplyr::mutate(across(c(evaluacindelprocesoteraputico, eva_consumo, eva_fam, eva_relinterp, eva_ocupacion, eva_sm, eva_fisica, eva_transgnorma),~dplyr::case_when(as.character(.)=="Logro Mínimo"~3,as.character(.)=="Logro M?mo"~3,as.character(.)=="Logro Intermedio"~2,as.character(.)=="Logro Alto"~1,TRUE~NA_real_)))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_d",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_d%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_d",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_d%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
     dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>%
      View()
}
# Variables -Inf, 
CONS_C1_df_dup_JUL_2020_prev_d%>%
    dplyr::mutate(across(c(out_cols_final),~na_if(., "-Inf")))%>%
    dplyr::rename("ano_bd_last_final"="ano_bd_final")%>%
    dplyr::rename("ano_bd_first_final"="ano_bd2_final")%>%
   #                 dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols), ends_with("_final"))%>% 
  assign("CONS_C1_df_dup_JUL_2020_prev_d",., envir = .GlobalEnv)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#e. Same value
#_#_#_#_#_#_#_
out_cols <- c("hash_key_cont_entries","id_cont_entries","nacionalidad_cont_entries","hash_rut_completo_cont_entries","sexo_2_cont_entries","embarazo_cont_entries","id_mod_cont_entries","fech_nac_cont_entries","edad_ini_cons_cont_entries","edad_ini_sus_prin_cont_entries","sus_ini_cont_entries","estado_conyugal_2_cont_entries","edad_grupos_cont_entries","etnia_cor_cont_entries","nacionalidad_2_cont_entries","etnia_cor_2_cont_entries","sus_ini_2_mod_cont_entries","sus_ini_3_mod_cont_entries","sus_ini_mod_cont_entries", "at_least_one_cont_entry_cont_entries")
#  c(paste0(gsub("_cont_entries", "", out_cols),"_final"),gsub("_cont_entries", "", out_cols))
out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")
  
CONS_C1_df_dup_JUL_2020_prev_d%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_e",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_e%>% 
    dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_e",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_e%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
      dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>% 
      View()
}
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#f.Largest treatment
#_#_#_#_#_#_#_
out_cols <-c('con_quien_vive_cont_entries','estatus_ocupacional_cont_entries','cat_ocupacional_cont_entries','origen_ingreso_cont_entries','via_adm_sus_prin_act_cont_entries','sus_principal_mod_cont_entries',"freq_cons_sus_prin_cont_entries","tipo_de_vivienda_mod_cont_entries",'tenencia_de_la_vivienda_mod_cont_entries','rubro_trabaja_mod_cont_entries','otras_sus1_mod_cont_entries','otras_sus2_mod_cont_entries','otras_sus3_mod_cont_entries',
             'dg_trs_cons_sus_or_cont_entries','tipo_de_programa_2_cont_entries','tipo_de_plan_2_for_f')

invisible(c("dg_trs_cons_sus_or","tipo_de_programa_2","dg_trs_cons_sus_or_cont_entries","tipo_de_programa_2_cont_entries"))

out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")
  
CONS_C1_df_dup_JUL_2020_prev_e%>%
  dplyr::mutate(tipo_de_plan_2_for_f_cont_entries=tipo_de_plan_2_for_f)%>%
  dplyr::mutate(tipo_de_plan_2_for_f=tipo_de_plan_2)%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_f",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_f%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_f",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_f%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
      dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>% 
      View()
}
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#g. Favor dgs. - wide format
#_#_#_#_#_#_#_
out_cols <-c('dg_trs_psiq_dsm_iv_or_cont_entries','dg_trs_psiq_sub_dsm_iv_or_cont_entries','x2_dg_trs_psiq_dsm_iv_or_cont_entries','x2_dg_trs_psiq_sub_dsm_iv_or_cont_entries','x3_dg_trs_psiq_dsm_iv_or_cont_entries','x3_dg_trs_psiq_sub_dsm_iv_or_cont_entries','dg_trs_psiq_cie_10_or_cont_entries','dg_trs_psiq_sub_cie_10_or_cont_entries','x2_dg_trs_psiq_cie_10_or_cont_entries','x2_dg_trs_psiq_sub_cie_10_or_cont_entries','x3_dg_trs_psiq_cie_10_or_cont_entries','x3_dg_trs_psiq_sub_cie_10_or_cont_entries','diagnostico_trs_fisico_cont_entries','otros_probl_at_sm_or_cont_entries')

out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")
  
CONS_C1_df_dup_JUL_2020_prev_f%>%
  dplyr::mutate(tipo_de_plan_2_for_f_cont_entries=tipo_de_plan_2_for_f)%>%
  dplyr::mutate(tipo_de_plan_2_for_f=tipo_de_plan_2)%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_g",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_g%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_g",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_g%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
      dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>% 
      View()
}
CONS_C1_df_dup_JUL_2020_prev_g%>%
  dplyr::mutate(across(c(x4_dg_trs_psiq_dsm_iv_or, x4_dg_trs_psiq_sub_dsm_iv_or, x4_dg_trs_psiq_cie_10_or, x5_dg_trs_psiq_cie_10_or, x6_dg_trs_psiq_cie_10_or, x4_dg_trs_psiq_sub_cie_10_or),~.,.names = "{col}_final"))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_g",., envir = .GlobalEnv)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#h. Sum values
#_#_#_#_#_#_#_
out_cols <-c("dias_trat_imp_cont_entries","dias_trat_inv_cont_entries")

out_cols_final <-paste0(gsub("_cont_entries", "", out_cols),"_final")
  
CONS_C1_df_dup_JUL_2020_prev_g%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_h",., envir = .GlobalEnv)

for(col in out_cols){
  old = gsub("_cont_entries", "", col)
  new = paste0(gsub("_cont_entries", "", col),"_final")
  
  CONS_C1_df_dup_JUL_2020_prev_h%>% dplyr::mutate(!!sym(new):= ifelse(is.na(concat_hash_id_treatments), as.character(!!sym(old)), as.character(!!sym(col))))%>%
  assign("CONS_C1_df_dup_JUL_2020_prev_h",., envir = .GlobalEnv)
}
no_mostrar="si"
if(no_mostrar=="no"){
    CONS_C1_df_dup_JUL_2020_prev_h%>%
      dplyr::filter(!is.na(concat_hash_id_treatments))%>%
      dplyr::select(concat_hash_id_treatments,!!(old),!!(out_cols),!!(out_cols_final))%>% 
      View()
}
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_      
#_#_#_#_#_#_#_
#Consolidación

CONS_C1_df_dup_JUL_2020_prev_h%>%
dplyr::select(c('row', 'row_cont_entries','concat_hash_id_treatments','id_centro_concat_a','obs_concat_a','tipo_de_plan_2_concat_a','rn_common_treats2','id_centro_final','tipo_de_plan_2_final','obs_final','tipo_centro_final','servicio_de_salud_final','senda_final','nombre_centro_final','numero_de_hijos_mod_final','ano_bd_last_final','num_hijos_trat_res_mod_final','tipo_centro_derivacion_final','fech_egres_imp_final','motivodeegreso_mod_imp_final','macrozona_final','nombre_region_final','comuna_residencia_cod_final','fech_ing_final','fecha_ingreso_a_convenio_senda_final','identidad_de_genero_final','edad_al_ing_final','origen_ingreso_mod_final','ano_bd_first_final','x_se_trata_mujer_emb_final','compromiso_biopsicosocial_final','dg_global_nec_int_soc_or_final','dg_nec_int_soc_cap_hum_or_final','dg_nec_int_soc_cap_fis_or_final','dg_nec_int_soc_cap_soc_or_final','usuario_tribunal_trat_droga_final','evaluacindelprocesoteraputico_final','eva_consumo_final','eva_fam_final','eva_relinterp_final','eva_ocupacion_final','eva_sm_final','eva_fisica_final','eva_transgnorma_final','dg_global_nec_int_soc_or_1_final','dg_nec_int_soc_cap_hum_or_1_final','dg_nec_int_soc_cap_fis_or_1_final','dg_nec_int_soc_cap_soc_or_1_final','tiene_menores_de_edad_a_cargo_final','ha_estado_embarazada_egreso_final','discapacidad_final','opcion_discapacidad_final','escolaridad_final','edad_al_ing_grupos_final','hash_key_final','id_final','nacionalidad_final','hash_rut_completo_final','sexo_2_final','embarazo_final','id_mod_final','fech_nac_final','edad_ini_cons_final','edad_ini_sus_prin_final','estado_conyugal_2_final','edad_grupos_final','freq_cons_sus_prin_final','via_adm_sus_prin_act_final','etnia_cor_final','nacionalidad_2_final','etnia_cor_2_final','sus_ini_2_mod_final','sus_ini_3_mod_final','sus_ini_mod_final','at_least_one_cont_entry_final','con_quien_vive_final','estatus_ocupacional_final','cat_ocupacional_final','sus_principal_mod_final','tipo_de_vivienda_mod_final','tenencia_de_la_vivienda_mod_final','rubro_trabaja_mod_final','otras_sus1_mod_final','otras_sus2_mod_final','otras_sus3_mod_final','dg_trs_cons_sus_or_final','tipo_de_programa_2_final','tipo_de_plan_2_for_f_final','dg_trs_psiq_dsm_iv_or_final','dg_trs_psiq_sub_dsm_iv_or_final','x2_dg_trs_psiq_dsm_iv_or_final','x2_dg_trs_psiq_sub_dsm_iv_or_final','x3_dg_trs_psiq_dsm_iv_or_final','x3_dg_trs_psiq_sub_dsm_iv_or_final','dg_trs_psiq_cie_10_or_final','dg_trs_psiq_sub_cie_10_or_final','x2_dg_trs_psiq_cie_10_or_final','x2_dg_trs_psiq_sub_cie_10_or_final','x3_dg_trs_psiq_cie_10_or_final','x3_dg_trs_psiq_sub_cie_10_or_final','diagnostico_trs_fisico_final','otros_probl_at_sm_or_final','x4_dg_trs_psiq_dsm_iv_or_final','x4_dg_trs_psiq_sub_dsm_iv_or_final','x4_dg_trs_psiq_cie_10_or_final','x5_dg_trs_psiq_cie_10_or_final','x6_dg_trs_psiq_cie_10_or_final','x4_dg_trs_psiq_sub_cie_10_or_final','dias_trat_imp_final','dias_trat_inv_final'))%>%
  assign("CONS_C1_df_dup_JUL_2020_cons",., envir = .GlobalEnv)
  
#row_cont_entries - Está concatenada
#tipo_de_plan_2_cont_entries -  table(CONS_C1_df_dup_JUL_2020_prev_h$tipo_de_plan_2_cont_entries) - 2. ultimo tratamiento (b)
#obs_cont_entries - obs del mismo valor - 2. mismo valor (e)

invisible(c("6. Estandarizar las fechas de ingreso y egreso al final"))
invisible(c("7. Dejar estos y sacar el prefijo mod"))
invisible(c("9. Este bind deberá tener en cuenta que hay variables q están numerizadas, otras se resumieron como otras_sus_1, o etc"))
invisible(c("11. Extender validación y ordenación de variables como dg_nec_ y eva_ evaluacindelprocesoteraputico"))
invisible(c("12. Hacer variable tipo de programa 2, en base a los planes"))
invisible(c("13. Validar otras sustancias para que no repitan la misma información del otras sus anterior"))
invisible(c("15. Ver factores"))

Standardization of variables

We generated variables that compare each treatment with their following treatment (in case the user had more than one treatment) (obs_cambios). Also we calculated the days of treatment of each treatment (dias_treat_imp_sin_na), and in case the treatment did not have a date of discharge, we decided to get the difference of days between the date of admission and the date of retrieval of the dataset (2020-11-13). For analytic terms, we added two situations to the cause of discharge: the treatment is in course and does not have another observation, or is in course but the treatment exceeds 1095 days of treatment. We also included two variables to indicate whether a treatment had a difference of less than 45 (menor_45_dias_diff) or 60 (menor_60_dias_diff) days with the following treatment (if any). Finally, we generated the variable abandono_temprano to differentiate between treatment that lasted at least three months and those that did not.


#       main= "Efecto conjunto de Cambios en Características de Tratamiento y Motivo de \n egreso (Derivación), en la probabilidad de que el Tratamiento Dure Menos o Igual de 60 días",

CONS_C1_df_dup_JUL_2020_cons%>%
    dplyr::rename_at(.vars = vars(matches("_final$")),
            .funs = funs(sub("_final$", "", .)))%>%
    dplyr::select(row, row_cont_entries,hash_key,hash_rut_completo,id,id_mod,fech_ing,fech_egres_imp,tipo_de_plan_2,tipo_de_plan_2_for_f,tipo_de_plan_2_concat_a,tipo_de_programa_2,id_centro,nombre_centro,id_centro_concat_a,everything())%>%
  dplyr::relocate(ano_bd_first,ano_bd_last,obs,obs_concat_a,rn_common_treats2,concat_hash_id_treatments,at_least_one_cont_entry, .after = last_col())%>%
  dplyr::select(-contains("dias_trat"))%>%
  dplyr::mutate(senda_concat_a=senda)%>%
  dplyr::mutate(senda=sub(".*\\; ","",senda))%>%
  dplyr::mutate(tipo_centro_concat_a=tipo_centro)%>%
  dplyr::mutate(tipo_centro=sub(".*\\; ","",tipo_centro))%>%
  dplyr::rename("tipo_de_plan_2_largest_treat"="tipo_de_plan_2_for_f")%>%
  #CONS_C1_df_dup_JUL_2020_cons2%>%dplyr::mutate(esto=ifelse(tipo_de_plan_2_for_f!=tipo_de_plan_2,1,0))%>% filter(esto==1)%>% View()
  
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(fech_ing_num=as.numeric(as.Date(fech_ing)))%>%
  dplyr::mutate(fech_egres_num=as.numeric(as.Date(fech_egres_imp)))%>%
  dplyr::mutate(fech_egres_num=ifelse(is.na(fech_egres_imp),18213,fech_egres_num))%>% #equivalente a 2019-11-13
  dplyr::mutate(fech_ing_next_treat=dplyr::lag(fech_egres_num))%>%
  dplyr::mutate(diff_bet_treat=fech_ing_next_treat-fech_egres_num)%>%
  dplyr::mutate(id_centro_sig_trat=dplyr::lag(id_centro)) %>%
  dplyr::mutate(tipo_plan_sig_trat=dplyr::lag(tipo_de_plan_2)) %>%
  dplyr::mutate(tipo_programa_sig_trat=dplyr::lag(tipo_de_programa_2)) %>%
  dplyr::mutate(senda_sig_trat=dplyr::lag(senda)) %>%
  dplyr::ungroup()%>%
  #para tener sólo los casos que corresponde, que tienen comparaciones con un siguiente. Los otros no me interesan
  dplyr::mutate(menor_60_dias_diff=case_when(diff_bet_treat<60~1,TRUE~0))%>%
  dplyr::mutate(menor_45_dias_diff= ifelse(diff_bet_treat<45,1,0))%>%
  dplyr::mutate(motivoegreso_derivacion=case_when(motivodeegreso_mod_imp=="Derivación"~1,TRUE~0))%>%
  dplyr::mutate(dias_treat_imp_sin_na=fech_egres_num-fech_ing_num)%>%
  dplyr::mutate(motivodeegreso_mod_imp=ifelse(dias_treat_imp_sin_na<1095 & is.na(fech_egres_imp),"En curso",as.character(motivodeegreso_mod_imp)))%>%
  dplyr::mutate(motivodeegreso_mod_imp=ifelse(dias_treat_imp_sin_na>=1095 & is.na(fech_egres_imp),"En curso (>=1095 d)",as.character(motivodeegreso_mod_imp)))%>%
  dplyr::mutate(motivodeegreso_mod_imp=ifelse(grepl("Abandono Temprano",motivodeegreso_mod_imp) & dias_treat_imp_sin_na>=90,"Abandono Tardio",as.character(motivodeegreso_mod_imp)))%>%
  dplyr::mutate(abandono_temprano=ifelse(dias_treat_imp_sin_na>=90,0,1)) %>%
  dplyr::mutate(abandono_temprano=as.factor(abandono_temprano)) %>%
  dplyr::mutate(abandono_temprano= dplyr::recode(abandono_temprano, "1"="Menos de 90 días", "0"="Mayor o igual a 90 días"))%>%
  
  dplyr::mutate(obs_cambios=case_when(id_centro_sig_trat!=id_centro~"1.1.cambio centro",TRUE~""))%>%
  dplyr::mutate(obs_cambios=case_when(tipo_plan_sig_trat!=tipo_de_plan_2~glue::glue("{obs_cambios};1.2.cambio tipo plan"),TRUE~obs_cambios))%>%
  dplyr::mutate(obs_cambios=case_when(tipo_programa_sig_trat!=tipo_de_programa_2~glue::glue("{obs_cambios};1.3.cambio tipo programa"),TRUE~obs_cambios))%>%
  dplyr::mutate(obs_cambios=case_when(senda_sig_trat!=senda~glue::glue("{obs_cambios};1.4.cambio senda"),TRUE~obs_cambios))%>%
  dplyr::mutate(obs_cambios_ninguno=case_when(obs_cambios==""~1,TRUE~0))%>%
  dplyr::mutate(obs_cambios_num=case_when(id_centro_sig_trat!=id_centro~1,TRUE~0))%>%
  dplyr::mutate(obs_cambios_num=case_when(tipo_plan_sig_trat!=tipo_de_plan_2~obs_cambios_num+1,TRUE~obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_num=case_when(tipo_programa_sig_trat!=tipo_de_programa_2~obs_cambios_num+1,TRUE~obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_num=case_when(senda_sig_trat!=senda~obs_cambios_num+1,TRUE~obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_num=as.numeric(obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_fac=obs_cambios_num)%>%
  dplyr::mutate(menor_45_dias_diff= recode(as.character(menor_45_dias_diff),"0"=">= 45 Days of Difference Between Entries","1"="<45 Days of Difference Between Entries"))%>%
  dplyr::mutate(menor_60_dias_diff= recode(as.character(menor_60_dias_diff),"0"=">= 60 Days of Difference Between Entries","1"="<60 Days of Difference Between Entries"))%>%
  dplyr::mutate(obs_cambios_ninguno= recode(as.character(obs_cambios_ninguno),"0"="At least 1 Change w/ the Next Entry","1"="No Changes w/ the Next Entry"))%>%
  dplyr::mutate(motivoegreso_derivacion= recode(as.character(motivoegreso_derivacion),"0"="Other causes of discharge","1"="Referral"))%>%
  dplyr::mutate_at(c('menor_45_dias_diff','menor_60_dias_diff','motivoegreso_derivacion','obs_cambios_ninguno','obs_cambios_fac'),~as.factor(.))%>%
  dplyr::mutate(via_adm_sus_prin_act=ifelse(grepl("Fumada o Pulmonar",via_adm_sus_prin_act),"Fumada o Pulmonar (aspiración de gases o vapores)",as.character(via_adm_sus_prin_act)))%>%
  assign("CONS_C1_df_dup_JUL_2020_cons2",., envir = .GlobalEnv)


We replaced the number of children/dependents (numero_de_hijos_mod) if the number of children/dependents is 0 and the number of children that are admitted to a residential treatment (num_hijos_trat_res_mod) is greated than 0. Also, we declared as invalid values if the user had no children but declares living with them. Additionally, we excluded the number of kids in a residential treatment if the user declared having no children/dependents but reported being admitted with at least one dependent. Finally, we collapsed the Age of Onset of Drug Use of Primary Substance in the variable edad_ini_sus_prin_grupos.


invisible(c("1.Validar el número de hijos con lo de número de hijos ing trat res"))
invisible(c("2.tipo_centro_derivacion ASEGURARSE QUE TENGA MOTIVO DE EGRESO DERIVACIÓN"))
invisible(c("num_hijos_trat_res numero_de_hijos_mod, num_hijos_trat_res_mod rubro_trabaja_mod tenencia_de_la_vivienda_mod tipo_de_vivienda_mod"))
#numero_de_hijos
#num_hijos_ing_trat_res
#edad_al_ing
#edad_ini_cons
#edad_ini_sus_prin

#Convivencia en los 30 días previos a la admisión a tratamiento

CONS_C1_df_dup_JUL_2020_cons2%>%
  dplyr::mutate(across(c(numero_de_hijos_mod, num_hijos_trat_res_mod, edad_ini_cons,edad_ini_sus_prin),~as.integer(.)))%>%
dplyr::mutate(across(c(edad_al_ing),~as.numeric(.)))%>%
  #más de 10 hijos, distinto numero de hijos y (el nro. de hijos para el tratamiento residencial es distinto al número de hijos | el nro. de hijos en tratamiento residencial es vacío)
    dplyr::mutate(numero_de_hijos_mod= dplyr::case_when(numero_de_hijos_mod>10 & (num_hijos_trat_res_mod!=numero_de_hijos_mod|is.na(num_hijos_trat_res_mod))~NA_integer_,TRUE~numero_de_hijos_mod))%>%
  #dplyr::filter(numero_de_hijos>10,num_hijos_ing_trat_res!=numero_de_hijos)%>% dplyr::select(numero_de_hijos_mod,num_hijos_ing_trat_res,numero_de_hijos)
  dplyr::mutate(numero_de_hijos_mod= dplyr::case_when(grepl("hij",con_quien_vive,ignore.case=T) & numero_de_hijos_mod==0 & num_hijos_trat_res_mod>0~num_hijos_trat_res_mod, TRUE~numero_de_hijos_mod))%>%
  #dplyr::filter(numero_de_hijos_mod==num_hijos_ing_trat_res,num_hijos_ing_trat_res>0,grepl("hij",con_quien_vive))%>%dplyr::select(numero_de_hijos_mod,num_hijos_ing_trat_res,numero_de_hijos,con_quien_vive) 
  dplyr::mutate(num_hijos_trat_res_mod= dplyr::case_when(numero_de_hijos_mod==0 & num_hijos_trat_res_mod>0 ~NA_integer_,TRUE~num_hijos_trat_res_mod))%>%
  dplyr::mutate(numero_de_hijos_mod= dplyr::case_when(numero_de_hijos_mod==0 & grepl("hij",con_quien_vive,ignore.case=T)~NA_integer_,TRUE~numero_de_hijos_mod))%>%
  dplyr::mutate(edad_ini_sus_prin_grupos=ifelse(edad_ini_sus_prin>=25,">=25",
                                                ifelse(edad_ini_sus_prin>18,"19-24",
                                                ifelse(edad_ini_sus_prin>15,"16-18",
                                                ifelse(edad_ini_sus_prin>0,"<=15",
                                                NA_character_)))))%>% 
assign("CONS_C1_df_dup_JUL_2020_cons3",., envir = .GlobalEnv)
invisible(c("2.tipo_centro_derivacion ASEGURARSE QUE TENGA MOTIVO DE EGRESO DERIVACIÓN"))
invisible(
  tabyl(CONS_C1_df_dup_JUN_2020,tipo_centro_derivacion,motivodeegreso_mod_imp)
)
invisible(
tabyl(CONS_C1_df_dup_JUL_2020_cons3,tipo_centro_derivacion,motivodeegreso_mod_imp)
)
invisible(c("Lo tenía malo porque estaba reemplazando si no estaba"))
invisible(23+3+11+91+379+76+39+7+28+23945)


We replaced cases that reported having dependent children if in the question related to dependent children they responded at least one.


#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
# Tiene menores de edad a cargo
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
invisible(c("10. Extender validación de tiene_menores_de_edad_a_cargo si tiene hijos"))
invisible(
  CONS_C1_df_dup_JUL_2020_cons3%>% janitor::tabyl(tiene_menores_de_edad_a_cargo,numero_de_hijos_mod)
)
CONS_C1_df_dup_JUL_2020_cons3%>%
    dplyr::mutate(tiene_menores_de_edad_a_cargo=ifelse(numero_de_hijos_mod>0 & tiene_menores_de_edad_a_cargo=="si","si","no"))%>%
  assign("CONS_C1_df_dup_JUL_2020_cons4",., envir = .GlobalEnv)


We focused on cases that had treatments with different gender identity, type of plan & type of program. As noted by SENDAs professionals, the type of program is subordinated to the type of plan that each user might have. Possibly and despite we kept those cases that had a gender identity in intermediate treatments, it is possible that when the entries collapsed into treatments, some information might overlapped generating some inconsistencies. This is why we checked for them, particularly related to gender-related programs.


We assumed that women and users that reported a gender identity of a women could be in a women or a general-population program. However, men with a masculine or not reported identity could only be in a general-population program. We are investigating to determine whether the assignment of the sex was wrong, or it corresponded to a problem of classification.


#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
# Tipo de plan y Tipo de Programa
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
invisible(
CONS_C1_df_dup_JUL_2020_cons4%>%
    dplyr::select(hash_key, fech_ing, fech_egres_imp)
)
#CONS_C1_df_dup_JUL_2020_cons4%>%janitor::tabyl(tipo_de_programa_2,tipo_de_plan_2,sexo_2)

invisible(c("a- no son mujeres, con identidad de género distinta a femenino y tienen un tipo de programa mujeres."))
  #CONS_C1_df_dup_JUL_2020_cons4%>% dplyr::filter(sexo_2!="Mujer",grepl("Muje",tipo_de_programa_2))%>% dplyr::filter(identidad_de_genero!="Femenino"|is.na(identidad_de_genero))
#tipo_de_plan_2_largest_treat
#CONS_C1_df_dup_JUL_2020_cons4%>% dplyr::filter(sexo_2!="Mujer",grepl("Muje",tipo_de_programa_2))%>% dplyr::filter(identidad_de_genero!="Femenino"|is.na(identidad_de_genero))%>% dplyr::select(obs)%>% dplyr::filter(grepl("2.6.01",obs)) #40 de 96 tienen otrA IDENTIDAD DE GÉNERO. INVESTIGAR.
invisible(c("ver los centros que tienen como 'mujeres' en el nombre, ahi puedo estar escondiendo una incosistencia. Ver si el problema está en la asignación del sexo o del programa"))

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

muestra=1
if(muestra==0) {
###### 1.2. Más de un valor de sexo por usuario (Duplicates 4) ##############
      casos_interes_sexo_tipo_programa_dup4<-
      CONS_C1_df_dup_JUL_2020_cons4%>% 
        dplyr::filter(sexo_2!="Mujer",grepl("Muje",tipo_de_programa_2))%>% 
        dplyr::filter(identidad_de_genero!="Femenino"|is.na(identidad_de_genero))  
        
      casos_interes_sexo_tipo_plan_dup4<-
        CONS_C1_df_dup_JUL_2020_cons4%>% 
        dplyr::filter(sexo_2!="Mujer",grepl("M-",tipo_de_plan_2))%>% 
        dplyr::filter(identidad_de_genero!="Femenino"|is.na(identidad_de_genero))
      
      casos_interes_sexo_tipo_programa_dup4%>%
        dplyr::select(row, row_cont_entries, hash_key, sexo_2, fech_ing, fech_egres_imp, tipo_de_plan_2,tipo_de_plan_2_largest_treat,tipo_de_programa_2,identidad_de_genero,obs)%>%
        left_join(CONS_C1_df_dup_ENE_2020_prev%>%janitor::clean_names()%>%dplyr::select(row,hash_key, sexo_2,tipo_de_plan_2, tipo_de_programa_2, identidad_de_genero),
                   by="hash_key", suffix=c("","_original"))%>% guardar_tablas("problema_programa_sexo")
      
      casos_interes_sexo_tipo_plan_dup4%>%
        dplyr::select(row, row_cont_entries, hash_key, sexo_2, fech_ing, fech_egres_imp, tipo_de_plan_2,tipo_de_plan_2_largest_treat,tipo_de_programa_2,identidad_de_genero,obs)%>%
        left_join(CONS_C1_df_dup_ENE_2020_prev%>%janitor::clean_names()%>%dplyr::select(row,hash_key, sexo_2,tipo_de_plan_2, tipo_de_programa_2, identidad_de_genero),
                  by="hash_key", suffix=c("","_original"))%>% guardar_tablas("problema_plan_sexo3")
}

invisible(c("a- no son mujeres, con identidad de género distinta a femenino y tienen un tipo de programa mujeres."))

problema_plan_sexo_analisis <- readxl::read_excel("G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/problema_plan_sexo analisis.xlsx",skip = 1)
#sexo_CAMBIOS SEXO (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)
#tipo_de_plan_CAMBIOS PLAN (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)
#tipo_de_programa_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)
#identidad_de_genero_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)
problema_plan_sexo_analisis_filter<-
problema_plan_sexo_analisis%>%
  dplyr::group_by(row)%>%
  dplyr::select(hash_key,`sexo_CAMBIOS SEXO (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,`tipo_de_plan_CAMBIOS PLAN (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,`tipo_de_programa_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,`identidad_de_genero_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`)%>%
  slice(1)
Adding missing grouping variables: `row`
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#INCORPORAR LOS CAMBIOS A LA BASE DE DATOS PRINCIPAL

#CONS_C1_df_dup_JUL_2020_cons4%>% dplyr::filter(sexo_2!="Mujer", identidad_de_genero!="Femenino",grepl("M-",tipo_de_plan_2))
#CONS_C1_df_dup_JUL_2020_cons4%>% dplyr::filter(sexo_2!="Mujer", is.na(identidad_de_genero),grepl("Muje",tipo_de_programa_2), grepl("M-",tipo_de_plan_2))

CONS_C1_df_dup_JUL_2020_cons4%>%
  #dplyr::mutate(tipo_de_programa_2= tipo_de_programa_2, tipo_de_plan_2_for_f)%>%
  #*6 tipo_de_programa tipo_de_plan: MANDA EL PLAN- M-PAI debe estar en mujeres. Cambiar a mujeres si está en población general// no hay casos
  dplyr::left_join(problema_plan_sexo_analisis_filter, by="row", suffix=c("","_sex_program"))%>%
  dplyr::mutate(changes= "")%>%
  dplyr::mutate(changes= dplyr::case_when(as.character(sexo_2)!=`sexo_CAMBIOS SEXO (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`~glue::glue("sex"),TRUE~""))%>%
  dplyr::mutate(changes= dplyr::case_when(as.character(tipo_de_plan_2)!=`tipo_de_plan_CAMBIOS PLAN (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`~glue::glue("{changes};plan"),TRUE~changes))%>%
  dplyr::mutate(changes= dplyr::case_when(as.character(tipo_de_programa_2)!=`tipo_de_programa_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`~glue::glue("{changes};prog"),TRUE~changes))%>%
  dplyr::mutate(changes= dplyr::case_when(as.character(identidad_de_genero)!=`identidad_de_genero_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`~glue::glue("{changes};gen"),TRUE~changes))%>%
  dplyr::mutate(changes=sub("^;","",changes))%>%
#Reemplazar valores en caso que hayan sido seleccionados
  dplyr::mutate(sexo_2= ifelse(!is.na(hash_key_sex_program),`sexo_CAMBIOS SEXO (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,sexo_2))%>%
  dplyr::mutate(tipo_de_plan_2= ifelse(!is.na(hash_key_sex_program),`tipo_de_plan_CAMBIOS PLAN (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,tipo_de_plan_2))%>%
  dplyr::mutate(tipo_de_programa_2= ifelse(!is.na(hash_key_sex_program),`tipo_de_programa_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,tipo_de_programa_2))%>%
  dplyr::mutate(identidad_de_genero= ifelse(!is.na(hash_key_sex_program),`identidad_de_genero_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,identidad_de_genero))%>%
#glimpse(CONS_C1_df_dup_JUL_2020_cons4%>% select(sexo_2,tipo_de_plan_2,tipo_de_programa_2,identidad_de_genero))
  dplyr::mutate(obs=case_when(!is.na(hash_key_sex_program)~glue::glue("{obs};4.05.Inconsistent Sex, Gender or Treatment ({changes})"),TRUE~obs))%>%
  dplyr::select(-`tipo_de_plan_CAMBIOS PLAN (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,-`tipo_de_programa_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,-`identidad_de_genero_CAMBIOS PROGRAMA (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`,-changes)%>%
  dplyr::ungroup()%>%
  #dplyr::filter(grepl("4.05.",obs))%>%  janitor::tabyl(obs) #para ver si el obs.
      assign("CONS_C1_df_dup_JUL_2020_cons5a",., envir = .GlobalEnv)

CONS_C1_df_dup_JUL_2020_cons5a%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(dis_sex=n_distinct(sexo_2))%>%
  #dplyr::filter(dis_sex>1)%>%
  dplyr::rename("sexo_rec"=`sexo_CAMBIOS SEXO (ID EMBARAZO NOMBE CENTRO Y OTRAS VARIABLES USER-INVARIANT DEBIESE SER POR HASH)`)%>%
  dplyr::mutate(sexo_rec= max(sexo_rec,na.rm=T))%>%
  dplyr::mutate(sexo_rec=ifelse(is.na(sexo_rec),sexo_2,sexo_rec))%>%
  dplyr::mutate(sexo_2=ifelse(sexo_2!=sexo_rec,sexo_rec,sexo_2))%>%
  #dplyr::select(row,hash_key,fech_ing,fech_egres_imp,ano_bd_first,ano_bd_last,sexo_2,embarazo, identidad_de_genero,tipo_de_plan_2,tipo_de_programa_2,senda, obs, sexo_rec)#x_se_trata_de_una_mujer_embarazada
  dplyr::mutate(id_mod=ifelse(sexo_2=="Mujer",`substr<-`(id_mod,5,5,"2"),`substr<-`(id_mod,5,5,"1")))%>%
  dplyr::mutate(id=ifelse(sexo_2=="Mujer",`substr<-`(id,5,5,"2"),`substr<-`(id,5,5,"1")))%>%
  dplyr::ungroup()%>%
  dplyr::mutate(centro_muj=ifelse(grepl("(Mujeres)",nombre_centro,ignore.case=T),1,0))%>%
  dplyr::select(-sexo_rec,-dis_sex)%>%
  assign("CONS_C1_df_dup_JUL_2020_cons6",., envir = .GlobalEnv)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#INCORPORAR LOS CAMBIOS A LA BASE DE DATOS PRINCIPAL
invisible(c("a- no son mujeres, con identidad de género distinta a femenino y tienen un tipo de programa mujeres."))

CONS_C1_df_dup_JUL_2020_cons6%>% 
  dplyr::filter(sexo_2!="Mujer",grepl("Muje",tipo_de_programa_2))%>%
  dplyr::filter(identidad_de_genero!="Femenino"|is.na(identidad_de_genero))%>%
  nrow()
[1] 0
  #janitor::tabyl(progr_incoherente)%>%
  #dplyr::mutate(progr_incoherente= dplyr::case_when(sexo_2!="Mujer" & grepl("Muje",tipo_de_programa_2) & (identidad_de_genero!="Femenino"|is.na(identidad_de_genero))~1,TRUE~0))%>%janitor::tabyl(progr_incoherente)
  #dplyr::filter(progr_incoherente==1)

muestra=1
if(muestra==0) {
    CONS_C1_df_dup_ENE_2020_prev6%>%
      janitor::clean_names()%>%
      dplyr::filter(hash_key %in% unlist(problema_plan_sexo_analisis_filter[,"hash_key"]))%>%
      dplyr::select(row,hash_key,fech_ing,fech_egres,ano_bd,sexo_2,embarazo, identidad_de_genero,tipo_de_plan_2,tipo_de_programa_2,senda, x_se_trata_de_una_mujer_embarazada, obs)
}
  #dplyr::mutate(sexo_2.1=as.factor(sexo_2.1))%>%
invisible(c("dg_trs_cons_sus_or","tipo_de_programa_2","dg_trs_cons_sus_or_cont_entries","tipo_de_programa_2_cont_entries"))


We had an amount of users that had changed their sex, but that change affected other entries of these users (users= 13; n=40).


#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
# Opcion_discapacidad
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
invisible(c("14. Validar opcion_discapacidad en función de discapacidad _---> está bien"))
invisible(c("16. Podría traerme por ej, opcion_discapacidad, en el caso que lo haya mencionado la persona, de una base anterior"))
CONS_C1_df_dup_JUL_2020_cons6%>% janitor::tabyl(discapacidad,opcion_discapacidad)


We decided to order specific variables based on its severity or level of vulnerability, by adding an arbitrary greater number to those categories with a greater vulnerability or severity.


CONS_C1_df_dup_JUL_2020_cons6%>% 
    dplyr::mutate(compromiso_biopsicosocial=dplyr::case_when(compromiso_biopsicosocial=="1"~"1-Leve",
                                                         compromiso_biopsicosocial=="2"~"2-Moderado",
                                                         compromiso_biopsicosocial=="3"~"3-Severo",TRUE~NA_character_))%>%
    dplyr::mutate(escolaridad=dplyr::case_when(escolaridad=="1"~"1-Mayor a Ed Secundaria",
                                               escolaridad=="2"~"2-Ed Secundaria Completa o Menor",
                                               escolaridad=="3"~"3-Ed Primaria Completa o Menor",TRUE~NA_character_))%>%
    dplyr::mutate(across(c(dg_global_nec_int_soc_or, dg_nec_int_soc_cap_hum_or, dg_nec_int_soc_cap_fis_or, dg_nec_int_soc_cap_soc_or,dg_global_nec_int_soc_or_1, dg_nec_int_soc_cap_hum_or_1, dg_nec_int_soc_cap_fis_or_1,dg_nec_int_soc_cap_soc_or_1),~dplyr::case_when(as.character(.)=="3"~"3-Bajas",as.character(.)=="2"~"2-Medias",as.character(.)=="1"~"1-Altas",TRUE~NA_character_)))%>%
    dplyr::mutate(across(c(evaluacindelprocesoteraputico, eva_consumo, eva_fam, eva_relinterp, eva_ocupacion, eva_sm, eva_fisica, eva_transgnorma),~dplyr::case_when(as.character(.)=="3"~"3-Logro Minimo",as.character(.)=="2"~"2-Logro Intermedio",as.character(.)=="1"~"1-Logro Alto",TRUE~NA_character_)))%>%
  dplyr::mutate(across(c(compromiso_biopsicosocial, escolaridad,dg_global_nec_int_soc_or, dg_nec_int_soc_cap_hum_or, dg_nec_int_soc_cap_fis_or, dg_nec_int_soc_cap_soc_or,dg_global_nec_int_soc_or_1, dg_nec_int_soc_cap_hum_or_1, dg_nec_int_soc_cap_fis_or_1,dg_nec_int_soc_cap_soc_or_1,evaluacindelprocesoteraputico, eva_consumo, eva_fam, eva_relinterp, eva_ocupacion, eva_sm, eva_fisica, eva_transgnorma),~as.factor(.)))%>%
  assign("CONS_C1_df_dup_JUL_2020_cons7",., envir = .GlobalEnv)
invisible(c("17. Ver categoría y estatus ocupacional ahora para ver inconsistencias"))
CONS_C1_df_dup_JUL_2020_cons7%>% janitor::tabyl(estatus_ocupacional,cat_ocupacional,rubro_trabaja_mod)
invisible(c("18. Debería recuperar embarazos de casos que volví a considerar mujeres"))
CONS_C1_df_dup_JUL_2020_cons7%>% janitor::tabyl(sexo_2,embarazo)
invisible(c("19. Centros que son de mujeres en hombres"))
CONS_C1_df_dup_JUL_2020_cons7%>% 
  janitor::tabyl(centro_muj,sexo_2)
#hay un grupo de hombres que están en centros mujeres


Additionally, we corrected the categories of other problems related to mental health (otros_probl_at_sm_or), and excluded if a user did not declare other elements related to mental health.


invisible(janitor::tabyl(CONS_C1_df_dup_JUL_2020_cons7,otros_probl_at_sm_or))

CONS_C1_df_dup_JUL_2020_cons7%>%
  dplyr::mutate(otros_probl_at_sm_or=sub("; Sin otros problemas de salud mental","",otros_probl_at_sm_or))%>%
  dplyr::mutate(otros_probl_at_sm_or=sub("Explotaci\\?omercial Sexual","Explotación Comercial Sexual",otros_probl_at_sm_or))%>%
  dplyr::mutate(otros_probl_at_sm_or=sub("Prisionizaci\\?","Prisionalización",otros_probl_at_sm_or))%>%
  dplyr::mutate(tenencia_de_la_vivienda_mod=sub("Ocupaci\\?","Ocupación Irregular",tenencia_de_la_vivienda_mod))%>%

  dplyr::mutate_at(vars(c("otros_probl_at_sm_or", "dg_trs_psiq_dsm_iv_or", "dg_trs_psiq_sub_dsm_iv_or", "dg_trs_psiq_sub_cie_10_or")),~dplyr::case_when(.==""~NA_character_,TRUE~.))%>%
  
  assign("CONS_C1_df_dup_JUL_2020_cons11",., envir = .GlobalEnv)
CONS_C1_df_dup_JUL_2020_cons11%>%
dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_cie_10_or,x4_dg_trs_psiq_cie_10_or,x5_dg_trs_psiq_cie_10_or,x6_dg_trs_psiq_cie_10_or),~dplyr::case_when(grepl("En estudio",as.character(.),ignore.case = T)~1,grepl("Sin trastorno",as.character(.),ignore.case = T)~0,is.na(.)~0,TRUE~0),.names = "{col}_mod1a"))%>%
  dplyr::mutate(total_cie_10_en_est = base::rowSums(dplyr::select(.,ends_with("_mod1a"))))%>%  
  dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_cie_10_or,x4_dg_trs_psiq_cie_10_or,x5_dg_trs_psiq_cie_10_or,x6_dg_trs_psiq_cie_10_or),~dplyr::case_when(grepl("En estudio",as.character(.),ignore.case = T)~0,grepl("Sin trastorno",as.character(.),ignore.case = T)~0,is.na(.)~0,TRUE~1),.names = "{col}_mod2a"))%>%
  dplyr::mutate(total_cie_10_dg = base::rowSums(dplyr::select(.,ends_with("_mod2a"))))%>%  
    
  dplyr::mutate(cie_10=dplyr::case_when(total_cie_10_dg>0 & total_cie_10_en_est>0~"Diagnosticado/a (1 o más)",
                       total_cie_10_dg>0 & total_cie_10_en_est==0~"Diagnosticado/a (1 o más)",
                       total_cie_10_dg==0 & total_cie_10_en_est>0~"En estudio",
                       TRUE~"Sin información diagnóstica"))%>%
  
dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,x4_dg_trs_psiq_dsm_iv_or),~dplyr::case_when(grepl("En estudio",as.character(.))~1,grepl("Sin trastorno",as.character(.))~0,is.na(.)~0,TRUE~0),.names = "{col}_mod1b"))%>%
  dplyr::mutate(total_dsm_iv_en_est = base::rowSums(dplyr::select(.,ends_with("_mod1b"))))%>%  
dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,x4_dg_trs_psiq_dsm_iv_or),~dplyr::case_when(grepl("En estudio",as.character(.),ignore.case = T)~0,grepl("Sin trastorno",as.character(.),ignore.case = T)~0,is.na(.)~0,TRUE~1),.names = "{col}_mod2b"))%>%
  dplyr::mutate(total_dsm_iv_dg = base::rowSums(dplyr::select(.,ends_with("_mod2b"))))%>%  
  
  dplyr::mutate(dsm_iv=dplyr::case_when(total_dsm_iv_dg>0 & total_dsm_iv_en_est>0~"Diagnosticado/a (1 o más)",
                       total_dsm_iv_dg>0 & total_dsm_iv_en_est==0~"Diagnosticado/a (1 o más)",
                       total_dsm_iv_dg==0 & total_dsm_iv_en_est>0~"En estudio",
                       TRUE~"Sin información diagnóstica"))%>%
  dplyr::select(-ends_with("_mod2b"),-ends_with("_mod2a"),-ends_with("_mod1a"),-ends_with("_mod1b"),
                -total_cie_10_dg,-total_cie_10_en_est,-total_dsm_iv_dg,-total_dsm_iv_en_est)%>%
  
assign("CONS_C1_df_dup_JUL_2020_cons12",., envir = .GlobalEnv)


We also restricted the number of dependants at the admission of a residential treatment to residential treatments only.


#table(CONS_C1_df_dup_JUL_2020_cons12$tipo_de_plan_2,CONS_C1_df_dup_JUL_2020_cons12$num_hijos_trat_res_mod)%>% View()
#
CONS_C1_df_dup_JUL_2020_cons12%>%
  dplyr::mutate(num_hijos_trat_res_mod=dplyr::case_when(
    !grepl("pr",tipo_de_plan_2,ignore.case=T) & num_hijos_trat_res_mod>0~NA_integer_,
    TRUE~num_hijos_trat_res_mod))%>%
assign("CONS_C1_df_dup_JUL_2020_cons13",., envir = .GlobalEnv)


In the first steps of the normalization of the database, collapsed the different plan types into the following: PG-PAB, PG-PAI, PG-PR, M-PAB, M-PAI, and M-PR. Other plans and programs were grouped into the General Population Programs due to their low prevalence (1.2%). Also, there are notations of the procedures of data normalization made by SENDAs professionals that indicates that the type of plan is followed by the type of program in terms of importance. This is why we changed the type of program, depending on the type of plan.


#table(CONS_C1_df_dup_JUL_2020_cons14$tipo_de_plan_2,CONS_C1_df_dup_JUL_2020_cons14$tipo_de_programa_2)
#table(CONS_C1$Tipo.de.Plan,CONS_C1$Tipo.de.Programa)
CONS_C1_df_dup_JUL_2020_cons13%>%
  dplyr::mutate(tipo_de_programa_2=dplyr::case_when(
    grepl("M-",tipo_de_plan_2,ignore.case=T)~"Programa Específico Mujeres",
    grepl("PG-",tipo_de_plan_2,ignore.case=T) & grepl("Muj",tipo_de_programa_2,ignore.case=T)~"Programa Población General",
    grepl("M-",tipo_de_plan_2,ignore.case=T) & !grepl("Muj",tipo_de_programa_2,ignore.case=T)~"Programa Específico Mujeres",
    grepl("PG-",tipo_de_plan_2,ignore.case=T) & grepl("Otro|Alcohol|Calles|Vigilada",tipo_de_programa_2,ignore.case=T)~"Programa Población General",
    TRUE~tipo_de_programa_2))%>%
  assign("CONS_C1_df_dup_JUL_2020_cons14",., envir = .GlobalEnv)


yoyo


#janitor::tabyl(CONS_C1_df_dup_JUL_2020_cons14$con_quien_vive)

CONS_C1_df_dup_JUL_2020_cons14b<-
CONS_C1_df_dup_JUL_2020_cons14 %>% 
  dplyr::mutate(con_quien_vive_rec=dplyr::case_when(
    grepl("Solo$",con_quien_vive, ignore.case=T)~"Solo",
    
    grepl("Con abuelos",con_quien_vive, ignore.case=T)~"Con familiares",
    grepl("Con hermanos",con_quien_vive, ignore.case=T)~"Con familiares",
    grepl("Con la madre \\(sola\\)",con_quien_vive, ignore.case=T)~"Con familiares",
    grepl("Con otro pariente",con_quien_vive, ignore.case=T)~"Con familiares",
    grepl("con hijos y padres o familia",con_quien_vive, ignore.case=T)~"Con familiares",
    grepl("con la pareja y padres o familia de origen",con_quien_vive, ignore.case=T)~"Con familiares",
    grepl("con padres o familia de origen",con_quien_vive, ignore.case=T)~"Con familiares",
    
    grepl("Únicamente con hijos",con_quien_vive, ignore.case=T)~"Únicamente con hijos",
    
    grepl("Únicamente con pareja",con_quien_vive, ignore.case=T)~"Únicamente con pareja",
    
    grepl("Hijos y Padres o Familia de Origen",con_quien_vive, ignore.case=T)~"Con pareja e hijos",
    grepl("Únicamente con la pareja e hijos",con_quien_vive, ignore.case=T)~"Con pareja e hijos",
    grepl("Únicamente con hijos",con_quien_vive, ignore.case=T)~"Únicamente con hijos",
    
    grepl("Con amigos",con_quien_vive, ignore.case=T)~"Otros",
    grepl("Con otro NO pariente",con_quien_vive, ignore.case=T)~"Otros",
    grepl("*Otros$",con_quien_vive, ignore.case=T)~"Otros")) #%>% 
    #janitor::tabyl(con_quien_vive, con_quien_vive_rec)

Cases with more than 1095 days of treatment

Finally, we focused on treatment days in some of the cases, due to the collapse of treatments may sum additional days to a treatment, resulting in extensive treatments (>1095 days) that would not be considered as valid treatments to SENDAs professionals. We distinguished between cases that corresponded to users that had more than one treatment, and cases that did not have an availbable date of dischargefa


#table(CONS_C1_df_dup_JUL_2020_cons14$tipo_de_plan_2,CONS_C1_df_dup_JUL_2020_cons14$tipo_de_programa_2)
#table(CONS_C1$Tipo.de.Plan,CONS_C1$Tipo.de.Programa)

mas_1095_usuarios<-
      CONS_C1_df_dup_JUL_2020_cons14b%>%
        dplyr::filter(dias_treat_imp_sin_na>1095)%>%
        dplyr::distinct(hash_key)%>%
        unlist()%>%
        as.character()

df_tab5<-
CONS_C1_df_dup_JUL_2020_cons14b%>%
    dplyr::filter(hash_key %in% mas_1095_usuarios)%>%
    dplyr::select(hash_key, ano_bd_first, ano_bd_last, fech_ing, dias_treat_imp_sin_na, fech_egres_imp,tipo_de_plan_2,senda)%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(count=n(),rn=row_number())%>%
  dplyr::filter(ano_bd_first<=2015, rn==row_number(),dias_treat_imp_sin_na>1095)

#Individuos con un caso, sin fecha de egreso y ordenada por dias de tratamiento.
in_un_caso_dias_trat_mas_1095<-
CONS_C1_df_dup_JUL_2020_cons14b%>%
    dplyr::filter(hash_key %in% mas_1095_usuarios)%>%
    dplyr::select(hash_key, ano_bd_first, ano_bd_last, fech_ing, dias_treat_imp_sin_na, fech_egres_imp,tipo_de_plan_2,senda)%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(count=n(),rn=row_number())%>%
  dplyr::filter(count==1)%>%
  dplyr::arrange(desc(dias_treat_imp_sin_na))

  #tidyr::pivot_wider(names_from=rn,values_from=fech_ing)%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#TABLA DE MOTIVOS DE EGRESO POR TRIMESTRE
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
invisible=5
if(invisible==4){
CONS_C1_df_dup_JUL_2020_cons14b%>%
  dplyr::mutate(date_by_quarter = lubridate::round_date(as.Date(fech_ing), unit="quarter"))%>%
  janitor::tabyl(date_by_quarter,motivodeegreso_mod_imp)
}
to_string <- as_labeller(c(`0` = "More than one entry by user (n=163)", `1` = "Only one entry by user (n=555)"))

desc_dias_mas_1095<-
CONS_C1_df_dup_JUL_2020_cons14b%>%
    dplyr::mutate(in_un_caso_dias_trat_mas_1095=ifelse(hash_key %in% unlist(in_un_caso_dias_trat_mas_1095["hash_key"]),1,0))%>%
    dplyr::mutate(in_un_caso_dias_trat_mas_1095=factor(in_un_caso_dias_trat_mas_1095))%>%
    dplyr::filter(dias_treat_imp_sin_na>1095)%>%
    ungroup()%>%
    dplyr::mutate(vacio_fech_egres=ifelse(!is.na(fech_egres_imp),1,0))%>%
    dplyr::select(dias_treat_imp_sin_na,vacio_fech_egres,in_un_caso_dias_trat_mas_1095)%>%
    dplyr::mutate(group=dplyr::case_when(vacio_fech_egres==1 & in_un_caso_dias_trat_mas_1095==1~"With missing date of discharge & only one case",vacio_fech_egres==0 & in_un_caso_dias_trat_mas_1095==1~"With date of discharge & only one case",vacio_fech_egres==1 & in_un_caso_dias_trat_mas_1095==0~"With missing date of discharge & more than one case",vacio_fech_egres==0 & in_un_caso_dias_trat_mas_1095==0~"With date of discharge & more than one case",TRUE~NA_character_))%>%
    dplyr::group_by(group)%>%
    summarise(
        `n` = n(),
        `Mdn` = median(dias_treat_imp_sin_na, na.rm = TRUE),
        `IQR` = IQR(dias_treat_imp_sin_na, na.rm = TRUE),
        `Perc. 25`= quantile(dias_treat_imp_sin_na,.25,na.rm=T),
        `Perc. 75`= quantile(dias_treat_imp_sin_na,.75,na.rm=T))

  library(gridExtra)
fig<-
CONS_C1_df_dup_JUL_2020_cons14b%>%
    dplyr::mutate(in_un_caso_dias_trat_mas_1095=ifelse(hash_key %in% unlist(in_un_caso_dias_trat_mas_1095["hash_key"]),1,0))%>%
  dplyr::mutate(in_un_caso_dias_trat_mas_1095=factor(in_un_caso_dias_trat_mas_1095))%>%
  dplyr::filter(dias_treat_imp_sin_na>1095)%>%
      ungroup()%>%
    dplyr::mutate(vacio_fech_egres=ifelse(!is.na(fech_egres_imp),1,0))%>%
    dplyr::select(dias_treat_imp_sin_na,vacio_fech_egres,in_un_caso_dias_trat_mas_1095)%>%
ggplot(aes(x = factor(vacio_fech_egres), y = dias_treat_imp_sin_na, group=factor(vacio_fech_egres))) +
    geom_boxplot() +
      geom_jitter(shape = 15,
        color = "steelblue",
        position = position_jitter(width = 0.21)) +
    theme_classic()+
  labs(x="Date of Discharge Not Available (=1)",
       y="Days in treatment until the date of retrieval of the dataset")+
  facet_wrap(~in_un_caso_dias_trat_mas_1095,ncol = 2, labeller = to_string)

tt <- ttheme_default(colhead=list(fg_params = list(parse=TRUE)),
                     base_size = 7.5, padding = unit(c(3, 4), "mm"))
tbl <- tableGrob(desc_dias_mas_1095, rows=NULL,theme=tt)

grid.arrange(fig, tbl, 
             nrow = 2, heights = c(4, 1),
             as.table = TRUE)
Figure 6. Criteria to Transform Variables

Figure 6. Criteria to Transform Variables

#:#:#:#:#:#:#:#:#:#:#
#REGRESION
if(invisible==4){
model<-
    CONS_C1_df_dup_JUL_2020_cons14b%>%
        dplyr::mutate(in_un_caso_dias_trat_mas_1095=ifelse(hash_key %in% unlist(in_un_caso_dias_trat_mas_1095["hash_key"]),1,0))%>%
      dplyr::mutate(in_un_caso_dias_trat_mas_1095=factor(in_un_caso_dias_trat_mas_1095))%>%
      dplyr::filter(dias_treat_imp_sin_na>1095)%>%
          ungroup()%>%
        dplyr::mutate(vacio_fech_egres=factor(ifelse(!is.na(fech_egres_imp),1,0)))%>%
        dplyr::select(dias_treat_imp_sin_na,vacio_fech_egres,in_un_caso_dias_trat_mas_1095)
  library(lsmeans)
  refR <- lsmeans(lm(dias_treat_imp_sin_na~ vacio_fech_egres*in_un_caso_dias_trat_mas_1095, data=model),
                  specs = c("vacio_fech_egres","in_un_caso_dias_trat_mas_1095"))
  g4R <- ggplot(data.frame(refR), aes(x= vacio_fech_egres, y=lsmean,group=in_un_caso_dias_trat_mas_1095, colour=in_un_caso_dias_trat_mas_1095))+
  geom_errorbar(aes(ymin=lower.CL, ymax=upper.CL), width=.1,position=position_dodge(0.1), size=1) +
  geom_point(position=position_dodge(0.1), size=2)+
    xlab("Date of discharge is empty (=1)")+
    ylab("Days of treatment")+
    sjPlot::theme_sjplot2() +
    geom_rect_interactive(alpha = 0.1, xmin=.1, xmax=.1, ymin=.1,ymax=.1) +
      # Remove plot elements added by geom_rect_interactive
    theme(legend.position="bottom")+
    guides(color=guide_legend(ncol=4,name = "Cause of Discharge"))+
    labs(color="More than one entry by user")+
    scale_colour_brewer(palette = "Set1",labels=c("More than one entry by user", "Only one entry by user"))+
    theme(legend.title = element_blank())
  g4R
}


There were 718 entries with >1095 days, that affects 950 users. As seen in the figure presented above, cases with more than 2000 days of treatment could be considered as anomalies, excepting in the group of those cases that showed only one entry by user and date of discharge, in which more than 50% of the treatments show over than 1800 days of treatment.


cases_w_intermediate_entries_w_more_1095d<-
CONS_C1_df_dup_JUL_2020_cons14b%>%
    dplyr::filter(hash_key %in% mas_1095_usuarios)%>%
    dplyr::arrange(hash_key,desc(fech_ing))%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(count=n(),rn=row_number())%>%
    dplyr::ungroup()%>%
    dplyr::mutate(mas_1095d_trat_intermedio=dplyr::case_when(rn>1 & dias_treat_imp_sin_na>1095~1,TRUE~0))%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(users_w_prob_cases=sum(mas_1095d_trat_intermedio))%>%
    dplyr::ungroup()%>%
  #Para ver los casos anomalos e intermedios
  #dplyr::filter(users_w_prob_cases==1)%>%
  dplyr::select(row,hash_key, fech_ing, dias_treat_imp_sin_na, fech_egres_imp,tipo_de_plan_2,senda,mas_1095d_trat_intermedio,rn,count,row_cont_entries,users_w_prob_cases)%>%
  #View()
    dplyr::filter(mas_1095d_trat_intermedio>0)

no_mostrar=1
if(no_mostrar==0){
  invisible(c("importante: los casos que se unieron pueden llegar hasta 3000 días"))
CONS_C1_df_dup_JUL_2020_cons14b%>%
    dplyr::filter(dias_treat_imp_sin_na>1095)%>%
    dplyr::filter(!is.na(row_cont_entries))%>%
    summarise(
        `n` = n(),
        `Mdn` = median(dias_treat_imp_sin_na, na.rm = TRUE),
        `IQR` = IQR(dias_treat_imp_sin_na, na.rm = TRUE),
        `Perc. 25`= quantile(dias_treat_imp_sin_na,.25,na.rm=T),
        `Perc. 75`= quantile(dias_treat_imp_sin_na,.75,na.rm=T),
        `max`=max(dias_treat_imp_sin_na,na.rm=T))
}


We decided to discard 718 entries with more than 1095 days of treatment, excepting intermediate entries of users with more than one treatment that had more than 1095 days (that is to say, users that had a treatment before and after the treatment with more than 1095 days of treatment) (n=71; users=71).


rows_more_1095d_no_int_treat_discarded<-
CONS_C1_df_dup_JUL_2020_cons14b%>%
  dplyr::filter(dias_treat_imp_sin_na>1095)%>%
  dplyr::filter(!row %in% unlist(cases_w_intermediate_entries_w_more_1095d[,"row"]))%>%
  dplyr::select(row)

CONS_C1_df_dup_JUL_2020_cons14b%>%
  dplyr::mutate(discarded_more_1095d_treat=ifelse(row %in% unlist(rows_more_1095d_no_int_treat_discarded[,"row"]),1,0))%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(discarded_more_1095d_treat_by_hash=sum(discarded_more_1095d_treat,na.rm = T))%>%
  dplyr::ungroup()%>%
  dplyr::mutate(obs=case_when(discarded_more_1095d_treat_by_hash>0~glue::glue("{obs};4.98.HASH w/ cases w/ >1095 d of treat"),TRUE~obs))%>%  
  dplyr::filter(!row %in% unlist(rows_more_1095d_no_int_treat_discarded[,"row"]))%>%
  dplyr::select(-discarded_more_1095d_treat,-discarded_more_1095d_treat_by_hash)%>%
  assign("CONS_C1_df_dup_JUL_2020_cons15",., envir = .GlobalEnv)
  ##CONS_C1_df_dup_JUL_2020_cons15 %>%  dplyr::filter(grepl('4.99.', obs)) 
#%>% View() %>% dplyr::group_by(obs) %>% summarise(n=n()) %>% dplyr::filter(n,grepl('4.99.', obs)) %>% View()


Information on Diagnoses

We ordered the ICD-10 and DSM-IV diagnostics of any given case, depending on how many different diagnoses each case may had. First we started recoding the sub categories. Once we had every distinct sub-cateogry, we looked for cases with no available sub-categories. If they had only a diagnose “In study” (“En estudio”), we conserved this general diagnose (category). It was not possible to work with subcategories, because an important part of the more general diagnostics did not have any diagnose on the subcategory. For example, in the first column of DSM-IV, a 43% had a general diagnose but did not have a sub-category related. Same with ICD-10 (63%). This lead us to conclude that diagnostic categories were not related to sub-categories, once these processes are done, in order to avoid to loose information. DSM-IV and ICD-10 Diagnoses and sub-categories are presented as a separated list of unique values by treatment, independently of one another.


Additionally, we generated a category to detect whether a user had at least one CIE-10 diagnose (cie_10). The same was done to DSM-IV diagnoses (dsm_iv). Also we tidied some secondary variables such as physical diagnose and other diagnoses. Lastly, we generated several variables to count how many different diagnoses each entry had.


dg_trs_psiq_sub_dsm_iv_or_cat<-CONS_C1_df_dup_JUL_2020_cons15%>% 
  dplyr::mutate(dg_trs_psiq_sub_dsm_iv_or=stringr::str_trim(as.character(dg_trs_psiq_sub_dsm_iv_or)))%>% 
  janitor::tabyl(dg_trs_psiq_sub_dsm_iv_or)%>% data.frame()%>% 
  dplyr::select(dg_trs_psiq_sub_dsm_iv_or)%>% 
  dplyr::filter(!dg_trs_psiq_sub_dsm_iv_or %in% c(NA))%>% unlist()%>% as.character() 

sub_dsm_iv_to_cie_10_comp_table <- readxl::read_excel("G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/sub_dsm_iv_to_cie_10_comp_table.xlsx", col_names = c("original","mod"))
#PARA DEJAR LAS CATEGORIAS GENERALES ACTUALIZADAS:  
cat_dsm_iv_desde_sub_dsm_iv<-
  CONS_C1_df_dup_JUN_2020%>%
  janitor::tabyl(dg_trs_psiq_sub_dsm_iv_or,dg_trs_psiq_dsm_iv_or)%>%
  melt()%>%#glimpse()
  data.frame()%>%
  dplyr::arrange(dg_trs_psiq_sub_dsm_iv_or,desc(value))%>%
  dplyr::group_by(dg_trs_psiq_sub_dsm_iv_or)%>%
  slice(1)%>%
  dplyr::mutate(variable=str_trim(variable))%>%
  dplyr::mutate(dg_trs_psiq_sub_dsm_iv_or=str_trim(dg_trs_psiq_sub_dsm_iv_or))
Warning in melt(.): The melt generic in data.table has been passed a tabyl
and will attempt to redirect to the relevant reshape2 method; please note that
reshape2 is deprecated, and this redirection is now deprecated as well. To
continue using melt methods from reshape2 while both libraries are attached,
e.g. melt.list, you can prepend the namespace like reshape2::melt(.). In the
next version, this warning will become an error.
Using dg_trs_psiq_sub_dsm_iv_or as id variables
sub_cie_10_to_cie_10_comp_table <-
  CONS_C1_df_dup_JUN_2020%>%
  janitor::tabyl(dg_trs_psiq_sub_cie_10_or,dg_trs_psiq_cie_10_or)%>%
  melt()%>%#glimpse()
  data.frame()%>%
  dplyr::arrange(dg_trs_psiq_sub_cie_10_or,desc(value))%>%
  group_by(dg_trs_psiq_sub_cie_10_or)%>%
  slice(1)%>%
  dplyr::mutate(variable=str_trim(variable))
Warning in melt(.): The melt generic in data.table has been passed a tabyl
and will attempt to redirect to the relevant reshape2 method; please note that
reshape2 is deprecated, and this redirection is now deprecated as well. To
continue using melt methods from reshape2 while both libraries are attached,
e.g. melt.list, you can prepend the namespace like reshape2::melt(.). In the
next version, this warning will become an error.
Using dg_trs_psiq_sub_cie_10_or as id variables
sub_dsm_iv_to_cie_10_comp_table <-
  cat_dsm_iv_desde_sub_dsm_iv%>%
  dplyr::left_join(sub_dsm_iv_to_cie_10_comp_table,by=c("dg_trs_psiq_sub_dsm_iv_or"="original"))
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

dg_trs_psiq_sub_cie_10_or_cat<-CONS_C1_df_dup_JUL_2020_cons15%>% 
  dplyr::mutate(dg_trs_psiq_sub_cie_10_or=stringr::str_trim(as.character(dg_trs_psiq_sub_cie_10_or)))%>% 
  janitor::tabyl(dg_trs_psiq_sub_cie_10_or)%>% data.frame()%>% 
  select(dg_trs_psiq_sub_cie_10_or)%>% 
  dplyr::filter(!dg_trs_psiq_sub_cie_10_or %in% c(NA))%>% unlist()%>% as.character() 
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
CONS_C1_df_dup_JUL_2020_cons15%>%
  dplyr::mutate(across(c(contains("psiq_sub_dsm_iv"),contains("psiq_sub_cie_10"),
                         diagnostico_trs_fisico,otros_probl_at_sm_or),~stringr::str_trim(as.character(.))))%>%
  #PARA CONVERTIR LOS DSM EN HOMOLOGACIONES A CIE 10 IN BRACKETS
  dplyr::left_join(dplyr::select(sub_dsm_iv_to_cie_10_comp_table,dg_trs_psiq_sub_dsm_iv_or,mod),by=c("dg_trs_psiq_sub_dsm_iv_or"="dg_trs_psiq_sub_dsm_iv_or"))%>%
  dplyr::left_join(dplyr::select(sub_dsm_iv_to_cie_10_comp_table,dg_trs_psiq_sub_dsm_iv_or,mod),by=c("x2_dg_trs_psiq_sub_dsm_iv_or"="dg_trs_psiq_sub_dsm_iv_or"))%>%
  dplyr::left_join(dplyr::select(sub_dsm_iv_to_cie_10_comp_table,dg_trs_psiq_sub_dsm_iv_or,mod),by=c("x3_dg_trs_psiq_sub_dsm_iv_or"="dg_trs_psiq_sub_dsm_iv_or"))%>%
  dplyr::left_join(dplyr::select(sub_dsm_iv_to_cie_10_comp_table,dg_trs_psiq_sub_dsm_iv_or,mod),by=c("x4_dg_trs_psiq_sub_dsm_iv_or"="dg_trs_psiq_sub_dsm_iv_or"))%>%
  dplyr::select(-contains("psiq_sub_dsm_iv"))%>%
  dplyr::rename("dg_trs_psiq_sub_dsm_iv_or"="mod.x", "x2_dg_trs_psiq_sub_dsm_iv_or"="mod.y", 
                "x3_dg_trs_psiq_sub_dsm_iv_or"="mod.x.x", "x4_dg_trs_psiq_sub_dsm_iv_or"="mod.y.y")%>%
  #PARA EXPLORACÓMO ESTÁ RECODIFICANDOLOS
    #dplyr::filter(row=="117796")%>%
    #dplyr::select(contains("dsm_iv"))
  #glimpse()
  dplyr::mutate(mod_cie_10_or =pmap_chr(select(.,contains("psiq_cie_10")), ~toString2(unique(na.omit(c(...))))))%>%
  dplyr::mutate(mod_dsm_iv_or =pmap_chr(select(.,contains("psiq_dsm_iv")), ~toString2(unique(na.omit(c(...))))))%>%
  dplyr::mutate(mod_sub_dsm_iv_or =pmap_chr(select(.,contains("psiq_sub_dsm_iv")), ~toString2(unique(na.omit(c(...))))))%>%
  dplyr::mutate(mod_sub_cie_10_or =pmap_chr(select(.,contains("psiq_sub_cie_10")), ~toString2(unique(na.omit(c(...))))))%>%
  
  dplyr::mutate(mod_cie_10_or= sub("Trastornos de los hábitos y del control de los impulsos;", "Trastornos de los hábitos y del control de los impulsos(F63);",mod_cie_10_or))%>%
  dplyr::mutate(mod_cie_10_or= sub("Trastornos de los hábitos y del control de los impulsos$", "Trastornos de los hábitos y del control de los impulsos(F63)",mod_cie_10_or))%>%
  dplyr::mutate(mod_cie_10_or= sub("; Sin trastorno\\(NA\\)$",replacement= "",mod_cie_10_or,ignore.case=T,perl=T))%>%
  dplyr::mutate(mod_cie_10_or= sub("^Sin trastorno\\(NA\\); ",replacement= "",mod_cie_10_or))%>%
  dplyr::mutate(mod_cie_10_or= sub(";Sin trastorno\\(NA\\);",replacement= ";",mod_cie_10_or))%>%
  dplyr::mutate(mod_cie_10_or= stringr::str_replace_all(mod_cie_10_or, "; Sin trastorno\\(NA\\);", ";"))%>%
  
  dplyr::mutate(mod_dsm_iv_or= sub("; Sin trastorno$",replacement= "",mod_dsm_iv_or,ignore.case=T,perl=T))%>%
  dplyr::mutate(mod_dsm_iv_or= sub("^Sin trastorno; ",replacement= "",mod_dsm_iv_or,ignore.case=T,perl=T))%>%
  dplyr::mutate(mod_dsm_iv_or= sub(";Sin trastorno;",replacement= ";",mod_dsm_iv_or,ignore.case=T,perl=T))%>%
  dplyr::mutate(mod_dsm_iv_or= stringr::str_replace_all(mod_dsm_iv_or, "; Sin trastorno;", ";"))%>%
  
  dplyr::mutate(mod_cie_10_or=dplyr::case_when(mod_cie_10_or=="Sin trastorno\\(NA\\); En estudio\\(NA\\)"~"En estudio(NA)",mod_cie_10_or=="En estudio\\(NA\\); Sin trastorno\\(NA\\)"~"En estudio(NA)",TRUE~mod_cie_10_or))%>%
  dplyr::mutate(mod_dsm_iv_or=dplyr::case_when(mod_dsm_iv_or=="Sin trastorno; En estudio"~"En estudio",
                                               mod_dsm_iv_or=="En estudio; Sin trastorno"~"En estudio",
                                               TRUE~mod_dsm_iv_or))%>%
  
       dplyr::mutate(across(c("mod_dsm_iv_or","mod_cie_10_or","diagnostico_trs_fisico","otros_probl_at_sm_or"),~str_count(., pattern = ";")+1,.names="cnt_{col}"))%>% 

   assign("CONS_C1_df_dup_JUL_2020_cons15b",., envir = .GlobalEnv)
#PARA REVISAR LAS DISTRIBUCIONES 
#CONS_C1_df_dup_JUL_2020_cons15b%>% dplyr::select(row,contains("cnt"))%>% names()
#CONS_C1_df_dup_JUL_2020_cons15b%>% janitor::tabyl(mod_cie_10_or)%>% guardar_tablas("revision_cie_10_sub")
#CONS_C1_df_dup_JUL_2020_cons15b%>% janitor::tabyl(mod_dsm_iv_or)%>% guardar_tablas("revision_dsm_iv_sub")

#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#SEPARAR EN COLUMNAS
CONS_C1_df_dup_JUL_2020_cons15b%>%
  #LOS EN ESTUDIO NO VAN CON SUBCATEGORIAS
  #janitor::tabyl(mod_sub_dsm_iv_or)
  #----> siempre debería ser EN estudio primero
  #si no hay sub categorias diagnosticas y hay un En estudio
  tidyr::separate(mod_sub_dsm_iv_or, c("dg_trs_psiq_sub_dsm_iv_or", "x2_dg_trs_psiq_sub_dsm_iv_or","x3_dg_trs_psiq_sub_dsm_iv_or","x4_dg_trs_psiq_sub_dsm_iv_or"), 
                  extra = "merge", fill = "warn", sep="; ")%>%
  
  dplyr::mutate(dg_trs_psiq_dsm_iv_or=ifelse(dg_trs_psiq_dsm_iv_or==""|dg_trs_psiq_dsm_iv_or=="Sin trastorno(NA)",NA_character_,dg_trs_psiq_dsm_iv_or))%>%
   #_#_#_#_#
  #PARA VER QUÉ TIPO DE CASOS ESTÁN DISPONIBLES Y SI TIENEN CARACTERES REPETIDOS.
  #_#_#_#_#
  #dplyr::select(contains("sub_dsm_iv"))%>%View()
  #dplyr::filter(!is.na(dg_trs_psiq_sub_dsm_iv_or),dg_trs_psiq_sub_dsm_iv_or!="",!is.na(x4_dg_trs_psiq_sub_dsm_iv_or))%>% View()
  tidyr::separate(mod_sub_cie_10_or, c("dg_trs_psiq_sub_cie_10_or", "x2_dg_trs_psiq_sub_cie_10_or","x3_dg_trs_psiq_sub_cie_10_or","x4_dg_trs_psiq_sub_cie_10_or"), 
                  extra = "merge", fill = "warn", sep="; ")%>%
  
  tidyr::separate(mod_dsm_iv_or, c("dg_trs_psiq_dsm_iv_or", "x2_dg_trs_psiq_dsm_iv_or","x3_dg_trs_psiq_dsm_iv_or","x4_dg_trs_psiq_dsm_iv_or"), 
                  extra = "merge", fill = "warn", sep="; ")%>%
  tidyr::separate(mod_cie_10_or, c("dg_trs_psiq_cie_10_or", "x2_dg_trs_psiq_cie_10_or","x3_dg_trs_psiq_cie_10_or","x4_dg_trs_psiq_cie_10_or","x5_dg_trs_psiq_cie_10_or"), extra = "merge", fill = "warn", sep="; ")%>%
  
  dplyr::mutate(dg_trs_psiq_cie_10_or=ifelse(dg_trs_psiq_cie_10_or==""|dg_trs_psiq_cie_10_or=="Sin trastorno(NA)",NA_character_,dg_trs_psiq_cie_10_or))%>%
  #_#_#_#_#
  #PARA VER QUÉ TIPO DE CASOS ESTÁN DISPONIBLES Y SI TIENEN CARACTERES REPETIDOS.
  #_#_#_#_#
  #dplyr::filter(!is.na(dg_trs_psiq_sub_cie_10_or),dg_trs_psiq_sub_cie_10_or!="",!is.na(x4_dg_trs_psiq_sub_cie_10_or))%>% 
  #dplyr::select(contains("sub_cie_10"))%>%View()
  assign("CONS_C1_df_dup_JUL_2020_cons15c",., envir = .GlobalEnv)
Warning: Expected 4 pieces. Missing pieces filled with `NA` in 109752 rows [1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Warning: Expected 4 pieces. Missing pieces filled with `NA` in 109748 rows [1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Warning: Expected 4 pieces. Missing pieces filled with `NA` in 109753 rows [1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Warning: Expected 5 pieces. Missing pieces filled with `NA` in 109754 rows [1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#::#:#:#:#:#:#:#
#REGULARIZAR OTRAS CATEGORIAS
CONS_C1_df_dup_JUL_2020_cons15c%>% 
  #janitor::tabyl(diagnostico_trs_fisico)
  dplyr::mutate(diagnostico_trs_fisico= sub("; Sin trastorno$",replacement= "",diagnostico_trs_fisico,ignore.case =T,perl=T))%>%
  dplyr::mutate(diagnostico_trs_fisico= sub("^Sin trastorno; ",replacement= "",diagnostico_trs_fisico,ignore.case =T,perl=T))%>%
  dplyr::mutate(diagnostico_trs_fisico= sub(";Sin trastorno;",replacement= ";",diagnostico_trs_fisico,ignore.case =T,perl=T))%>%
  dplyr::mutate(diagnostico_trs_fisico= stringr::str_replace_all(diagnostico_trs_fisico, "; Sin trastorno;", ";"))%>%
  #janitor::tabyl(diagnostico_trs_fisico)
  dplyr::mutate(otros_probl_at_sm_or= sub("; Sin otros problemas de salud mental$",replacement= "",otros_probl_at_sm_or,ignore.case = T, perl = T))%>%
  dplyr::mutate(otros_probl_at_sm_or= sub("^Sin otros problemas de salud mental; ",replacement= "",otros_probl_at_sm_or,ignore.case = T, perl = T))%>%
  dplyr::mutate(otros_probl_at_sm_or= sub(";Sin otros problemas de salud mental;",replacement= ";",otros_probl_at_sm_or,ignore.case = T, perl = T))%>%
  dplyr::mutate(otros_probl_at_sm_or= stringr::str_replace_all(otros_probl_at_sm_or, "; Sin otros problemas de salud mental;", ";"))%>%
  #janitor::tabyl(otros_probl_at_sm_or)
  dplyr::select(-contains("x6_dg"))%>%
  dplyr::mutate(dg_trs_psiq_sub_dsm_iv_or=dplyr::case_when(dg_trs_psiq_sub_dsm_iv_or==""~NA_character_,TRUE~dg_trs_psiq_sub_dsm_iv_or))%>%
  dplyr::mutate(dg_trs_psiq_sub_cie_10_or= ifelse(grepl("^$|^ $", dg_trs_psiq_sub_cie_10_or)==TRUE, NA,dg_trs_psiq_sub_cie_10_or))%>%
  dplyr::mutate(dg_trs_psiq_dsm_iv_or= ifelse(grepl("^$|^ $", dg_trs_psiq_dsm_iv_or)==TRUE, NA,dg_trs_psiq_dsm_iv_or))%>%
  assign("CONS_C1_df_dup_JUL_2020_cons16",., envir = .GlobalEnv)

sin_mostrar=1
if (sin_mostrar=="00"){
  invisible(c("ES PARA VER CÓMO SE COMPORTA"))
CONS_C1_df_dup_JUL_2020_cons15d%>%
  dplyr::select(row,dg_trs_psiq_cie_10_or,dg_trs_psiq_sub_cie_10_or,x2_dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_sub_cie_10_or,x3_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_sub_cie_10_or,x4_dg_trs_psiq_cie_10_or,x4_dg_trs_psiq_sub_cie_10_or)%>% 
  #dplyr::filter(as.character(row) %in% c("29875", "25234"))%>% #29875
  #dplyr::filter(as.character(row) %in% c("89191", "78137", "74707", "67292"))%>%
  #dplyr::filter(as.character(row) %in% c("35696", "26802", "25956", "21873", "12865"))%>%
  View()
}

#dg_trs_psiq_sub_cie_10_or, dg_trs_psiq_dsm_iv_or


#REGULARIZAR OTRAS CATEGORIAS
#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#::#:#:#:#:#:#:##:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#::#:#:#:#:#:#:#
CONS_C1_df_dup_JUL_2020_cons16%>%
dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_cie_10_or,x4_dg_trs_psiq_cie_10_or,x5_dg_trs_psiq_cie_10_or),~dplyr::case_when(grepl("En estudio",as.character(.),ignore.case = T)~1,grepl("Sin trastorno",as.character(.),ignore.case = T)~0,is.na(.)~0,TRUE~0),.names = "{col}_mod1a"))%>%
  dplyr::mutate(total_cie_10_en_est = base::rowSums(dplyr::select(.,ends_with("_mod1a"))))%>%  
  dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_cie_10_or,x3_dg_trs_psiq_cie_10_or,x4_dg_trs_psiq_cie_10_or,x5_dg_trs_psiq_cie_10_or),~dplyr::case_when(grepl("En estudio",as.character(.),ignore.case = T)~0,grepl("Sin trastorno",as.character(.),ignore.case = T)~0,is.na(.)~0,TRUE~1),.names = "{col}_mod2a"))%>%
  dplyr::mutate(total_cie_10_dg = base::rowSums(dplyr::select(.,ends_with("_mod2a"))))%>%  
    
  dplyr::mutate(cie_10=dplyr::case_when(total_cie_10_dg>0 & total_cie_10_en_est>0~"Diagnosticado/a (uno en estudio)",
                       total_cie_10_dg>0 & total_cie_10_en_est==0~"Diagnosticado/a (sin otros registros)",
                       total_cie_10_dg==0 & total_cie_10_en_est>0~"En estudio (sin diagnosticados)",
                       TRUE~"Sin información diagnóstica"))%>%
  
dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,x4_dg_trs_psiq_dsm_iv_or),~dplyr::case_when(grepl("En estudio",as.character(.))~1,grepl("Sin trastorno",as.character(.))~0,is.na(.)~0,TRUE~0),.names = "{col}_mod1b"))%>%
  dplyr::mutate(total_dsm_iv_en_est = base::rowSums(dplyr::select(.,ends_with("_mod1b"))))%>%  
dplyr::mutate(across(c(dg_trs_psiq_cie_10_or,x2_dg_trs_psiq_dsm_iv_or,x3_dg_trs_psiq_dsm_iv_or,x4_dg_trs_psiq_dsm_iv_or),~dplyr::case_when(grepl("En estudio",as.character(.),ignore.case = T)~0,grepl("Sin trastorno",as.character(.),ignore.case = T)~0,is.na(.)~0,TRUE~1),.names = "{col}_mod2b"))%>%
  dplyr::mutate(total_dsm_iv_dg = base::rowSums(dplyr::select(.,ends_with("_mod2b"))))%>%  
  
  dplyr::mutate(dsm_iv=dplyr::case_when(total_dsm_iv_dg>0 & total_dsm_iv_en_est>0~"Diagnosticado/a (sin otros registros)",
                       total_dsm_iv_dg>0 & total_dsm_iv_en_est==0~"Diagnosticado/a (sin otros registros)",
                       total_dsm_iv_dg==0 & total_dsm_iv_en_est>0~"En estudio (sin diagnosticados)",
                       TRUE~"Sin información diagnóstica"))%>%
  dplyr::select(-ends_with("_mod2b"),-ends_with("_mod2a"),-ends_with("_mod1a"),-ends_with("_mod1b"),
                -total_cie_10_dg,-total_cie_10_en_est,-total_dsm_iv_dg,-total_dsm_iv_en_est)%>%
  
assign("CONS_C1_df_dup_JUL_2020_cons17",., envir = .GlobalEnv)
CONS_C1_df_dup_JUL_2020_cons17%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(fech_ing_next_treat=dplyr::lag(fech_ing_num))%>%
  dplyr::mutate(diff_bet_treat=fech_ing_next_treat-fech_egres_num)%>%
  dplyr::mutate(id_centro_sig_trat=dplyr::lag(id_centro)) %>%
  dplyr::mutate(tipo_plan_sig_trat=dplyr::lag(tipo_de_plan_2)) %>%
  dplyr::mutate(tipo_programa_sig_trat=dplyr::lag(tipo_de_programa_2)) %>%
  dplyr::mutate(senda_sig_trat=dplyr::lag(senda)) %>%
  dplyr::ungroup()%>%
  #para tener sólo los casos que corresponde, que tienen comparaciones con un siguiente. Los otros no me interesan
  dplyr::mutate(menor_60_dias_diff=case_when(diff_bet_treat<60~1,TRUE~0))%>%
  dplyr::mutate(menor_45_dias_diff= ifelse(diff_bet_treat<45,1,0))%>%
  dplyr::mutate(obs_cambios=case_when(id_centro_sig_trat!=id_centro~"1.1.cambio centro",TRUE~""))%>%
  dplyr::mutate(obs_cambios=case_when(tipo_plan_sig_trat!=tipo_de_plan_2~glue::glue("{obs_cambios};1.2.cambio tipo plan"),TRUE~obs_cambios))%>%
  dplyr::mutate(obs_cambios=case_when(tipo_programa_sig_trat!=tipo_de_programa_2~glue::glue("{obs_cambios};1.3.cambio tipo programa"),TRUE~obs_cambios))%>%
  dplyr::mutate(obs_cambios=case_when(senda_sig_trat!=senda~glue::glue("{obs_cambios};1.4.cambio senda"),TRUE~obs_cambios))%>%
  dplyr::mutate(obs_cambios_ninguno=case_when(obs_cambios==""~1,TRUE~0))%>%
  dplyr::mutate(obs_cambios_num=case_when(id_centro_sig_trat!=id_centro~1,TRUE~0))%>%
  dplyr::mutate(obs_cambios_num=case_when(tipo_plan_sig_trat!=tipo_de_plan_2~obs_cambios_num+1,TRUE~obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_num=case_when(tipo_programa_sig_trat!=tipo_de_programa_2~obs_cambios_num+1,TRUE~obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_num=case_when(senda_sig_trat!=senda~obs_cambios_num+1,TRUE~obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_num=as.numeric(obs_cambios_num))%>%
  dplyr::mutate(obs_cambios_fac=obs_cambios_num)%>%
  dplyr::mutate(menor_45_dias_diff= recode(as.character(menor_45_dias_diff),"0"=">= 45 Days of Difference Between Entries","1"="<45 Days of Difference Between Entries"))%>%
  dplyr::mutate(menor_60_dias_diff= recode(as.character(menor_60_dias_diff),"0"=">= 60 Days of Difference Between Entries","1"="<60 Days of Difference Between Entries"))%>%
  dplyr::mutate(obs_cambios_ninguno= recode(as.character(obs_cambios_ninguno),"0"="At least 1 Change w/ the Next Entry","1"="No Changes w/ the Next Entry"))%>%
  dplyr::mutate_at(c('menor_45_dias_diff','menor_60_dias_diff','obs_cambios_ninguno','obs_cambios_fac'),~as.factor(.))%>%  
  assign("CONS_C1_df_dup_JUL_2020_cons18",., envir = .GlobalEnv)

Normalize Progression of Educational Attainment by Users


Once treatments were defined between each other, we noticed that many users reported a determined educational attainment, but in a following treatment, registries shown inconsistent levels of educational attainment. This is why we decided to focus on the 2,337 users with inconsistencies throughout their different treatments.


hash_key_escolaridad<-
CONS_C1_df_dup_JUL_2020_cons18%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(esc_num=as.numeric(substring(as.character(escolaridad), 1, 1)))%>%
    dplyr::mutate(esc_num_lag=lag(esc_num))%>%
    dplyr::mutate(fech_ing_lag=lag(fech_ing))%>%
    dplyr::mutate(escolaridad_lag=lag(escolaridad))%>%
    dplyr::filter(esc_num_lag>esc_num)%>% #El tratamiento posterior tiene menor escolaridad que el actual
    dplyr::select(row,hash_key,fech_ing, esc_num_lag,esc_num,escolaridad,escolaridad_lag,fech_ing_lag)%>%
          dplyr::distinct(hash_key)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

hash_key_escolaridad_rules<-
CONS_C1_df_dup_JUL_2020_cons18%>%
dplyr::filter(hash_key %in% unlist(hash_key_escolaridad))%>%
  dplyr::mutate(esc_num=as.numeric(substring(as.character(escolaridad), 1, 1)))%>%
  dplyr::mutate(nas_ed=ifelse(is.na(escolaridad),1,0))%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(n_dis_esc=n_distinct(escolaridad),n=n(), rn_esc=row_number(),
                nas_ed=sum(nas_ed),nas_ed=ifelse(nas_ed>0,1,0), min_ed=max(esc_num, na.rm=T))%>%
  dplyr::ungroup()%>%
  dplyr::group_by(hash_key,escolaridad)%>%
  dplyr::mutate(n_hash_esc=n())%>%
  dplyr::ungroup()%>%
  dplyr::select(row,hash_key,fech_ing,esc_num,escolaridad,n_dis_esc,n,n_hash_esc,rn_esc,nas_ed,min_ed)%>%
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #Generar variables de comparación
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(esc_num_lag_post=lag(esc_num),esc_num_lead_ant=lead(esc_num))%>% 
  dplyr::mutate(ed_problematico_post=dplyr::case_when(esc_num_lag_post>esc_num~1,TRUE~0))%>%
  dplyr::mutate(ed_problematico_ant=dplyr::case_when(esc_num_lead_ant>esc_num~1,TRUE~0))%>%
  dplyr::mutate(ant_ed_problematico_post=lead(ed_problematico_post))%>%
  dplyr::mutate(post_ed_problematico_ant=lag(ed_problematico_ant))%>%
  dplyr::mutate(the_rank= rank(-n_hash_esc, ties.method = "min"))%>% #"max"
  dplyr::mutate(mfv=ifelse(the_rank==1,esc_num,NA_real_))%>%
  dplyr::mutate(the_rank_post=lag(the_rank))%>% 
  dplyr::mutate(the_rank_ant=lead(the_rank))%>% 
  dplyr::mutate(mfv=max(mfv, na.rm=T))%>%
  dplyr::ungroup()%>%
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #A.Si no hay perdidos, 3 o más escolaridades, más de 3 casos y existe un caso en medio que es problematico y este no es un caso más frecuente, reemplazar con el valor posterior. A menos que el error esté en el final (a.2). EN ese caso, ver si está en la primera fila (caso más reciente) y no tiene valor posterior, reemplazar con <
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::mutate(ed_a= dplyr::case_when(nas_ed==0 & n_dis_esc>2 & n>3 & ant_ed_problematico_post==1 & post_ed_problematico_ant==1 & the_rank>1 & the_rank_post==1~esc_num_lag_post,TRUE~NA_real_))%>%
  #Error al final
  dplyr::mutate(ed_a2= dplyr::case_when(nas_ed==0 & n_dis_esc>2 & n>3 & ant_ed_problematico_post==1 & is.na(post_ed_problematico_ant) & the_rank>1 & rn_esc==1 & the_rank_ant==1~esc_num_lead_ant,TRUE~NA_real_))%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #B. Hay un usuario que tiene más de un caso con una educación particular. Es decir, tiene solo un valor más frecuente.
   ##############NO ME DA LO MISMO SI HAY UN VACIO, NO LO VOY A CAMBIAR IGUAL POR UN VALOR DETERMINADO. EJ: ES UN CASO DEL 2011 Y DE AHI VUELVE EL   2016, e1b2708112875d77f7d3d1bd87c10164 
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::mutate(ed_b= dplyr::case_when(n_dis_esc==n-1 & n_hash_esc==1 & the_rank >1 & nas_ed==0~mfv,TRUE~NA_real_))%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #C. En el caso de que todos los casos sean distintos y aunque haya valores perdidos, elegir el valor máximo (en este caso, equivalente al mínimo) , para todos los HASH
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::mutate(ed_c= dplyr::case_when(n_dis_esc==n~min_ed,TRUE~NA_real_))%>% # & nas_ed==0
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #Si hay empates y el posterior es the_rank==1, podría reemplazarse con el posterior.
  #D. (Ej. 154608 04b09b0ad8f6d8cbf9871594cf10f7e5) Tiene 2 casos distintos, aunque no está en el medio. 3 casos iguales (univers), aunque el final es una anomalía (secundaria)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::mutate(ed_d= dplyr::case_when(ant_ed_problematico_post==1 & post_ed_problematico_ant==1 & esc_num_lag_post==esc_num_lead_ant ~esc_num_lag_post,TRUE~NA_real_))%>%
  #CUIDADO: Si hay empates y el posterior es rank 1 puede ser un caso problemático
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #D. (Ej: 112484 e1b2708112875d77f7d3d1bd87c10164) El caso que está en medio es menor, hay NAs y no hay restricción de n (ant_ed_problematico_post==1 & post_ed_problematico_ant==1 & the_rank>1), . El caso menos frecuente (esc_num_lag_post==esc_num_lead_ant (el mismo valor en el tratamiento anterior y en el tratamiento posterior). 
  #:#:#:#:#FIJARSE QUE NO HAYA EMPATE EN EL VALOR MÁS FRECUENTE.#_#_#_#__#_#_#_ 
    dplyr::mutate(ed_d2= dplyr::case_when(ed_problematico_post==1 & ed_problematico_ant==1 & esc_num_lag_post==esc_num_lead_ant ~esc_num_lag_post,TRUE~NA_real_))%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #E. (ej. 157971  fe456c2da940fa8ece275e509a634242), hay un caso igual al anterior, misma frecuencia, por lo que hay empate en mfv y rn_esc>1 (no es el último caso). Hay empate, pero para estos fines no me interesa mucho.
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::mutate(ed_e= dplyr::case_when(rn_esc==n & ed_problematico_post==1 ~esc_num_lag_post,TRUE~NA_real_))%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #F. e(ej. 140181 485a5ff2e2c7aa0943c292e337ea1411) Casos distintos, más de 3 casos, no hay NAs debiese ser el caso más reciente qcon el problema, es el probemático y con el rank más alto (no es el caso más frecuente).
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_  
  dplyr::mutate(ed_f= dplyr::case_when(n_dis_esc==2 & n>3 & rn_esc==1 & ant_ed_problematico_post==1 & the_rank>1~esc_num_lead_ant,TRUE~NA_real_))%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_  
   #G. (Ej. 114450 e1b2708112875d77f7d3d1bd87c10164), tiene NAs en el primer registro, lo ignoro. Tiene 3 distintos, por lo que si hay una educación en medio, debiese reemplazar. Esto puede usarse si coinciden uno y otro
  
  #F. (ej. 39890, faa8263a28f47dbaefa77c326ea96b2a) no tiene valores perdidos, hay eventos intermedios inconsistentes, hay empates
  ## Ver cómo generar más de un mfv. EN una de esas sacar el mfv posterior para ver si es uno también.
  ## Puede ser sin el rank==1 mientras que el posterior al problemático tenga ==1 en el rank.
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #H. Hay 3 escolaridades distintas (ej: 72013 bf70334c0a891bee1d016bc530eece8d)
  
  ##NO SE QUÉ HAY QUE CAMBIAR AQUÍ, PORQUE HAY QUE VER EL CONJUNTO DE LAS VARIABLES PARA CAMBIARLO. LA CONDICION SERIA QUE POR CADA USUARIO, VER QUE TENGA CASOS PROBLEMATICOS TRAT POSTERIOR, NO TENGA PERDIDOS, ESE CASO PROBLEMATICO POSTERIOR ES TAMBIÉN UN RANK==1
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #i. (EJ. 4e6b041e4a0a40e7c6bc8a8a65f842d6  132495)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_  
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_    
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
####  Ver si hay casos que no tienen reglas###
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::group_by(hash_key)%>%
dplyr::mutate(across(c("ed_a", "ed_a2", "ed_b", "ed_c","ed_d","ed_d2","ed_e","ed_f"),~ifelse(abs(max(.,na.rm=T)) == Inf,NA,.), .names= "{col}_n"))%>% 
  dplyr::select(-ends_with("_n"))%>%
  dplyr::ungroup() %>% 

#hash_key_escolaridad_rules%>%
    dplyr::mutate(total_mean = base::rowSums(dplyr::select(., ed_a, ed_a2, ed_b, ed_c, ed_d, ed_d2, ed_e, ed_f), na.rm=T))%>%
  dplyr::group_by(hash_key) %>% 
    dplyr::mutate(total_mean=max(total_mean,na.rm=T)) %>% 
  ungroup() %>% 
  #dplyr::filter(total_mean==0 ) %>%  #VER SI LO SACO O NO. UNA VEZ QUE TENGA TODO LISTO

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#$G. Hay empate en cantidad de valores distintos. El primer caso tiene educación valor menor (secundaria), después 2 mayores a secundaria, y finalmente en la última entrada (más reciente), pasa de universitaria (1) a completa o menor (2). Cambiar el último porque no hay progresión en ese.
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#7776e5c9335808de3e8389ac2c685056 93804 & 133259  
  #4 casos, 2 distintas escolaridades, partió con secundaria (2), de ahí siguió con universitaria (1)2 veces, hasta volver con ed. secundaria    
#7776e5c9335808de3e8389ac2c685056 78986 42125    
  dplyr::mutate(ed_g= dplyr::case_when(total_mean==0 & n_dis_esc==2 & rn_esc==1 & ant_ed_problematico_post==1~esc_num_lead_ant,TRUE~NA_real_))%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
  #|||°°°°°°°°°°|||°°°°°°°°°°|||°°°°°°°°°°|||°°°°°°°°°°||||||°°°°°°°°°°|||°°°°°°°°°°|||°°°°°°°°°°|||°°°°°°°°°°|||
  #:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:#:
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#Hay unos que se deben ver en su conjunto (ej: 156063), ver si el primer caso es igual al último al interior de un usuario. Si es así, reemplazar todos los vaalores --> que no sean NAs
  #Ver casos anómalos en que hay 2 seguidos.
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#$H. Sólo 2 valores distintos, (n= n_dis_esc >2), Hay tratamientos en medio. EN el ejemplo, hay 5 casos, tiene 2 entradas con universitaria (1), luego 2 entradas que se ven contradictorias (ej, 2 casos completa o menor)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#144b81fab8cd7e2796a133085c6d8d16 131587 & 44293
  dplyr::group_by(hash_key) %>% 
    dplyr::mutate(mix_max_esc=ifelse(dplyr::first(esc_num)==dplyr::last(esc_num) & n_dis_esc==2,1,0),esc_num_last=dplyr::last(esc_num))%>% 
    dplyr::mutate(ties=ifelse(n_distinct(n_hash_esc)>1,0,1), ties=max(ties,na.rm=T))%>%
    dplyr::ungroup()%>%
  dplyr::mutate(ed_h= dplyr::case_when(total_mean==0 & mix_max_esc==1 ~esc_num_last,TRUE~NA_real_))%>% #para dejar el valor min y max en reemplazo al resto de los casos.
  dplyr::mutate(ed_i= dplyr::case_when(total_mean==0 & ties==1 & mix_max_esc==0 & n_dis_esc==2~min_ed,TRUE~NA_real_))%>% #p
#
  dplyr::mutate(ed_j= dplyr::case_when(total_mean==0 & is.na(ed_i) & is.na(ed_h)~min_ed,TRUE~NA_real_))%>%
#49868 0f1218127d5370806310f2ccc6784302
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(across(c("ed_h", "ed_i", "ed_j"),~ifelse(abs(max(.,na.rm=T)) == Inf,NA,.), .names= "{col}_n"))%>% 
  dplyr::mutate(across(c("ed_a", "ed_a2", "ed_b", "ed_c","ed_d","ed_d2","ed_e","ed_f","ed_h","ed_i","ed_j"),~ifelse(.>0,1,0), .names= "{col}_N-dis"))%>% 
  dplyr::select(-ends_with("_n"))%>%
  dplyr::ungroup() %>%
  dplyr::mutate(no_suggestions = base::rowSums(dplyr::select(., "ed_a_N-dis", "ed_a2_N-dis","ed_b_N-dis","ed_c_N-dis","ed_d_N-dis","ed_d2_N-dis","ed_e_N-dis","ed_f_N-dis","ed_h_N-dis","ed_i_N-dis","ed_j_N-dis"), na.rm=T))%>%
  dplyr::mutate(total_mean2 = base::rowSums(dplyr::select(., ed_h, ed_i, ed_j), na.rm=T))%>%
  dplyr::select(-ends_with("_N-dis")) %>% 
  dplyr::group_by(hash_key) %>% 
    dplyr::mutate(total_mean2=max(total_mean2,na.rm=T)) %>% 
    dplyr::mutate(no_suggestions= max(no_suggestions,na.rm = T)) %>% 
  dplyr::ungroup()
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#4 casos, 2 distintas escolaridades, las 2 primeras universitaria, las 2 que le siguen están en secundaria, hay empate. Debiese dejar la más vulnerable.
#bde6cd6a3f35291441df7ae58f1ba4bb 118109 103929     ant_ed_problematico_post esc_num_lead_ant  

hash_key_escolaridad_rules_final<-
hash_key_escolaridad_rules%>%
  dplyr::select(row,no_suggestions, starts_with("ed_"))
  #dplyr::filter(no_suggestions>1) %>% 
#2,285 #6,320
  CONS_C1_df_dup_JUL_2020_cons18%>%
  dplyr::mutate(obs=case_when(row %in% as.numeric(unlist(hash_key_escolaridad_rules_final$row))~glue::glue("{obs};4.99. Education Changed"),TRUE~obs))%>%
  dplyr::left_join(hash_key_escolaridad_rules_final, by="row")%>%
  dplyr::select(-no_suggestions,ed_problematico_post, ed_problematico_ant) %>% 
  dplyr::mutate(esc_num=as.numeric(substring(as.character(escolaridad), 1, 1)))%>% #janitor::tabyl(esc_num)
  dplyr::mutate(esc_num=dplyr::case_when(!is.na(ed_a)~ed_a,
                                         !is.na(ed_a2)~ed_a2,
                                         !is.na(ed_b)~ed_b,
                                         !is.na(ed_c)~ed_c,
                                         !is.na(ed_d)~ed_d,
                                         !is.na(ed_d2)~ed_d2,
                                         !is.na(ed_e)~ed_e,
                                         !is.na(ed_f)~ed_f,
                                         !is.na(ed_g)~ed_g,
                                         !is.na(ed_h)~ed_h,
                                         !is.na(ed_i)~ed_i,
                                         !is.na(ed_j)~ed_j,
                                         TRUE~esc_num)) %>% #janitor::tabyl(esc_num)
  dplyr::mutate(escolaridad_rec=dplyr::case_when(esc_num==1~"1-Mayor a Ed Secundaria",
                                                esc_num==2~"2-Ed Secundaria Completa o Menor",
                                                esc_num==3~"3-Ed Primaria Completa o Menor",
                                                TRUE~NA_character_)) %>% 
  dplyr::select(-starts_with("ed_"),-esc_num) %>% 
  assign("CONS_C1_df_dup_JUL_2020_cons19",., envir = .GlobalEnv)
# esc_num     n     percent valid_percent
#       1 19473 0.177420824     0.1781382
#       2 60782 0.553792048     0.5560312
#       3 29059 0.264760013     0.2658305
#      NA   442 0.004027115            NA
  #UNA VEZ QUE HAGO EL FILTRO
#   esc_num     n     percent valid_percent
#       1 18675 0.170150151     0.1708303
#       2 60843 0.554347826     0.5565638
#       3 29801 0.271520464     0.2726059
#      NA   437 0.003981559            NA

hash_key_escolaridad2<-
CONS_C1_df_dup_JUL_2020_cons19%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(esc_num=as.numeric(substring(as.character(escolaridad_rec), 1, 1)))%>%
    dplyr::mutate(esc_num_lag=lag(esc_num))%>%
    dplyr::mutate(fech_ing_lag=lag(fech_ing))%>%
    dplyr::mutate(escolaridad_lag=lag(escolaridad_rec))%>%
    dplyr::filter(esc_num_lag>esc_num)%>% #El tratamiento posterior tiene menor escolaridad que el actual
    dplyr::select(row,hash_key,fech_ing, esc_num_lag,esc_num,escolaridad,escolaridad_lag,fech_ing_lag)%>%
          dplyr::distinct(hash_key)
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

hash_key_escolaridad_rules2<-
CONS_C1_df_dup_JUL_2020_cons19%>%
dplyr::filter(hash_key %in% unlist(hash_key_escolaridad2))%>%
  dplyr::mutate(esc_num=as.numeric(substring(as.character(escolaridad_rec), 1, 1)))%>%
  dplyr::mutate(nas_ed=ifelse(is.na(escolaridad_rec),1,0))%>%
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(n_dis_esc=n_distinct(escolaridad_rec),n=n(), rn_esc=row_number(),
                nas_ed=sum(nas_ed),nas_ed=ifelse(nas_ed>0,1,0), min_ed=max(esc_num, na.rm=T))%>%
  dplyr::ungroup()%>%
  dplyr::group_by(hash_key,escolaridad_rec)%>%
  dplyr::mutate(n_hash_esc=n())%>%
  dplyr::ungroup()%>%
  dplyr::select(row,hash_key,fech_ing,esc_num,escolaridad,escolaridad_rec,n_dis_esc,n,n_hash_esc,rn_esc,nas_ed,min_ed)%>%
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  #Generar variables de comparación
  #_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::group_by(hash_key)%>%
  dplyr::mutate(esc_num_lag_post=lag(esc_num),esc_num_lead_ant=lead(esc_num))%>% 
  dplyr::mutate(ed_problematico_post=dplyr::case_when(esc_num_lag_post>esc_num~1,TRUE~0))%>%
  dplyr::mutate(ed_problematico_ant=dplyr::case_when(esc_num_lead_ant>esc_num~1,TRUE~0))%>%
  dplyr::mutate(ant_ed_problematico_post=lead(ed_problematico_post))%>%
  dplyr::mutate(post_ed_problematico_ant=lag(ed_problematico_ant))%>%
  dplyr::mutate(the_rank= rank(-n_hash_esc, ties.method = "min"))%>% #"max"
  dplyr::mutate(mfv=ifelse(the_rank==1,esc_num,NA_real_))%>%
  dplyr::mutate(the_rank_post=lag(the_rank))%>% 
  dplyr::mutate(the_rank_ant=lead(the_rank))%>% 
  dplyr::mutate(mfv=max(mfv, na.rm=T))%>%
  dplyr::ungroup()%>%
  
 dplyr::group_by(hash_key) %>% 
    dplyr::mutate(esc_num_first=dplyr::first(esc_num),esc_num_last=dplyr::last(esc_num))%>% 
    dplyr::ungroup()%>%
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::mutate(ed2_a= dplyr::case_when(esc_num_first==min_ed & nas_ed==0 & n_dis_esc==2~min_ed,TRUE~NA_real_))%>% #
  dplyr::group_by(hash_key) %>% 
  dplyr::mutate(ed2_a_n=ifelse(abs(max(ed2_a,na.rm=T)) == Inf,NA,ed2_a)) %>% 
  ungroup() %>% 
#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_
  dplyr::mutate(ed2_b= dplyr::case_when(is.na(ed2_a_n) & nas_ed==0 & ant_ed_problematico_post==1 & post_ed_problematico_ant==1 & n_dis_esc==2~esc_num_lead_ant,TRUE~NA_real_))
Warning in max(ed2_a, na.rm = T): ningun argumento finito para max; retornando -
Inf

Warning in max(ed2_a, na.rm = T): ningun argumento finito para max; retornando -
Inf

Warning in max(ed2_a, na.rm = T): ningun argumento finito para max; retornando -
Inf

Warning in max(ed2_a, na.rm = T): ningun argumento finito para max; retornando -
Inf

Warning in max(ed2_a, na.rm = T): ningun argumento finito para max; retornando -
Inf
  CONS_C1_df_dup_JUL_2020_cons19%>%
  dplyr::left_join(hash_key_escolaridad_rules2[,c("row","ed2_a","ed2_b")], by="row")%>%
  dplyr::mutate(esc_num=as.numeric(substring(as.character(escolaridad_rec), 1, 1)))%>% #janitor::tabyl(esc_num)
  dplyr::mutate(esc_num=dplyr::case_when(!is.na(ed2_a)~ed2_a,
                                         !is.na(ed2_b)~ed2_b,
                                         TRUE~esc_num)) %>% #janitor::tabyl(esc_num)
  dplyr::mutate(escolaridad_rec=dplyr::case_when(esc_num==1~"1-Mayor a Ed Secundaria",
                                                esc_num==2~"2-Ed Secundaria Completa o Menor",
                                                esc_num==3~"3-Ed Primaria Completa o Menor",
                                                TRUE~NA_character_)) %>% 
  dplyr::select(-starts_with("ed2_"),-esc_num) %>% 
  assign("CONS_C1_df_dup_JUL_2020_cons19b",., envir = .GlobalEnv)
  
if(
  CONS_C1_df_dup_JUL_2020_cons19b%>%
    dplyr::group_by(hash_key)%>%
    dplyr::mutate(esc_num=as.numeric(substring(as.character(escolaridad_rec), 1, 1)))%>%
    dplyr::mutate(esc_num_lag=lag(esc_num))%>%
    dplyr::mutate(fech_ing_lag=lag(fech_ing))%>%
    dplyr::mutate(escolaridad_lag=lag(escolaridad_rec))%>%
    dplyr::filter(esc_num_lag>esc_num)%>% #El tratamiento posterior tiene menor escolaridad que el actual
    dplyr::select(row,hash_key,fech_ing, esc_num_lag,esc_num,escolaridad,escolaridad_lag,fech_ing_lag)%>%
    dplyr::distinct(hash_key)%>% nrow()
>0){"there are still levels of educational attainment left to normalize"}

Generate values of the trajectories of users


We created variables to obtain a summary of each trajectory in terms of the days treated. One variable, cum_dias_trat_sin_na and their mean (mean_cum_dias_trat_sin_na) aim to get the cumulative days treated by each patient, depending on the number of treatments up to a determined treatment. Also, we added variables related to the cumulative difference between treatments (cum_diff_bet_treat and mean_cum_diff_bet_treat). These variables let us identify changes in the treatment lengths and time to readmission throughout the trajectory of each user in SENDA between 2010-2019.


library(magrittr)

Attaching package: 'magrittr'
The following object is masked from 'package:purrr':

    set_names
The following object is masked from 'package:tidyr':

    extract
CONS_C1_df_dup_JUL_2020_cons19b %>% 
  dplyr::arrange(hash_key,fech_ing) %>%
  dplyr::mutate(keep_tipo_de_plan_2=tipo_de_plan_2)%>% 
  dplyr::group_by(hash_key) %>% 
  dplyr::mutate(rn_hash_discard=row_number())%>% 
  dplyr::mutate(rn_hash=row_number())%>% 
  dplyr::mutate(n_hash=n())%>% 
  dplyr::mutate(cum_dias_trat_sin_na=cumsum(tidyr::replace_na(dias_treat_imp_sin_na, 0)))%>%
  dplyr::mutate(keep_cum_dias_trat_sin_na=cumsum(tidyr::replace_na(dias_treat_imp_sin_na, 0)))%>%
  dplyr::mutate(mean_cum_dias_trat_sin_na=cum_dias_trat_sin_na/rn_hash)%>%
  dplyr::mutate(keep_mean_cum_dias_trat_sin_na=cum_dias_trat_sin_na/rn_hash)%>% 
  dplyr::mutate(cum_diff_bet_treat=cumsum(tidyr::replace_na(diff_bet_treat, 0)))%>% 
  dplyr::mutate(keep_cum_diff_bet_treat=cumsum(tidyr::replace_na(diff_bet_treat, 0)))%>% 
  dplyr::mutate(mean_cum_diff_bet_treat=cum_diff_bet_treat/rn_hash)%>%
  dplyr::mutate(keep_mean_cum_diff_bet_treat=mean_cum_diff_bet_treat)%>%
  dplyr::ungroup()%>% 

  #":":":":":":"":":":":": Prueba
  #dplyr::select(hash_key,rn_hash_discard,rn_hash,n_hash,cum_dias_trat_sin_na_rev,mean_cum_dias_trat_sin_na_rev,cum_diff_bet_treat_rev,n_hash,diff_bet_treat)%>% dplyr::filter(hash_key=="0093cc44fee21895b9e55f3d84e51928") %>% View()
  tidyr::pivot_wider(
    names_from =  rn_hash_discard, 
    names_sep="_",
    values_from = c(tipo_de_plan_2, 
                    dias_treat_imp_sin_na, 
                    diff_bet_treat,
                    cum_dias_trat_sin_na,
                    mean_cum_dias_trat_sin_na, 
                    cum_diff_bet_treat,
                    mean_cum_diff_bet_treat)
  )%>% #glimpse()

  #dplyr::filter(hash_key=="0093cc44fee21895b9e55f3d84e51928") %>% View()
   dplyr::group_by(hash_key)%>%
  dplyr::mutate_at(vars(tipo_de_plan_2_1:mean_cum_diff_bet_treat_10),~suppressWarnings(max(as.character(.),na.rm=T)))%>%
  dplyr::ungroup() %>%
  
  dplyr::mutate_at(vars(dias_treat_imp_sin_na_1:mean_cum_diff_bet_treat_10),~as.numeric(.))%>%
  dplyr::mutate(diff_bet_treat=fech_ing_next_treat-fech_egres_num)%>%
  dplyr::mutate(dias_treat_imp_sin_na=fech_egres_num-fech_ing_num)%>%
  #dplyr::select(hash_key,n_hash,fech_ing, starts_with("tipo_de_plan_2"),starts_with("dias_treat_imp_sin_na"),starts_with("diff_bet_treat_"),starts_with("mean_cum_dias_trat_sin_na_"),starts_with("mean_cum_dias_trat_sin_na_"))%>% dplyr::filter(hash_key=="0093cc44fee21895b9e55f3d84e51928")%>% View()
  dplyr::rename_at(.vars = vars(matches("^keep_")),
            .funs = funs(sub("^keep_", "", .)))%>%
      assign("CONS_C1_df_dup_JUL_2020_cons20",., envir = .GlobalEnv)

no_mostrar=0
if(no_mostrar==1){
CONS_C1_df_dup_JUL_2020_cons20 %>% 
    dplyr::group_by(hash_key) %>% 
    dplyr::mutate(n_hash=n())%>% 
   dplyr::ungroup()%>% 
    dplyr::filter(n_hash>3) %>% 
    dplyr::mutate_at(vars(dias_treat_imp_sin_na:cum_diff_bet_treat_rev_10),~suppressWarnings(max(as.character(.),na.rm=T)))%>%
  dplyr::select(hash_key,n_hash,fech_ing, starts_with("tipo_de_plan_2"),starts_with("dias_treat_imp_sin_na"),starts_with("diff_bet_treat_"),starts_with("mean_cum_dias_trat_sin_na_"),starts_with("mean_cum_dias_trat_sin_na_"),starts_with("mean_cum_diff_bet_treat_"))%>% View()
}
 # CONS_C1_df_dup_JUL_2020_cons19 %>% dplyr::mutate(n_hash=n())%>% dplyr::filter(n_hash>1)%>% select(diff_bet_treat_1) %>%  summary()
CONS_C1_df_dup_JUL_2020_cons20%>%
  dplyr::mutate_at(vars(contains("dg_trs_psiq_cie_10"),contains("dg_trs_psiq_dsm_iv"),contains("dg_trs_psiq_sub_cie_10"),contains("dg_trs_psiq_sub_dsm_iv"),contains("tipo_de_plan"),c('tipo_de_programa_2','nombre_centro','tipo_centro','servicio_de_salud','senda','tipo_centro_derivacion','usuario_tribunal_trat_droga','motivodeegreso_mod_imp','macrozona','nombre_region','comuna_residencia_cod','identidad_de_genero','origen_ingreso_mod','x_se_trata_mujer_emb','usuario_tribunal_trat_droga','tiene_menores_de_edad_a_cargo','ha_estado_embarazada_egreso','discapacidad','opcion_discapacidad','escolaridad','edad_al_ing_grupos','nacionalidad','sexo_2','embarazo','estado_conyugal_2','edad_grupos','freq_cons_sus_prin','via_adm_sus_prin_act','etnia_cor','nacionalidad_2','etnia_cor_2','sus_ini_2_mod','sus_ini_3_mod','sus_ini_mod','con_quien_vive','estatus_ocupacional','cat_ocupacional','sus_principal_mod', 'tipo_de_vivienda_mod','tenencia_de_la_vivienda_mod','rubro_trabaja_mod','otras_sus1_mod','otras_sus2_mod','otras_sus3_mod','dg_trs_cons_sus_or','diagnostico_trs_fisico','otros_probl_at_sm_or','ano_bd_first','ano_bd_last','centro_muj','cie_10','dsm_iv','escolaridad_rec','con_quien_vive_rec','edad_ini_sus_prin_grupos')),~as.factor(.))%>%
  dplyr::mutate_at(c('id_centro_sig_trat','tipo_plan_sig_trat','tipo_programa_sig_trat','senda_sig_trat','at_least_one_cont_entry','id_centro'),~as.factor(.))%>%
  dplyr::mutate_at(c('tipo_de_plan_2_concat_a'),~as.character(.))%>%
  dplyr::mutate_at(c('fech_ing','fech_egres_imp'),~as.Date(.))%>%
  
    assign("CONS_C1_df_dup_JUL_2020",., envir = .GlobalEnv)

Consolidation of the dataset and its variables

  metadata(CONS_C1_df_dup_JUL_2020)$name <- "Agreement 1 SENDA"
  metadata(CONS_C1_df_dup_JUL_2020)$description <- "Information About Agreement 1 of SENDA and MINSAL. Contians information about treatments.(*) Intermediate events are collapsed and concatenated in some variables; Criteria to select values from entries that were collapsed into single treatments: Wide format(a),Maximum/Last value(b), Minimum/First value(c), Kept more vulnerable category(d), Same value(e), Largest treatment(f), Favored dgs.-a(g), Sum values(h). In case of 'tipo_de_plan_2','dias_treat_imp_sin_na', 'diff_bet_treat', 'cum_dias_trat_sin_na', 'mean_cum_dias_trat_sin_na' & 'cum_diff_bet_treat', the first variable, 10 variables were generated for each variable and represents each treatment of user, since the first(1) to the last(10)"

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_  
  
codebook::var_label(CONS_C1_df_dup_JUL_2020) <- list(
row= 'Numerador de los eventos presentes en la Base de Datos (Último registro)/Events in the Dataset (Last Entry)',
row_cont_entries= 'Numerador de los eventos presentes en la Base de Datos(*)/Events in the Dataset(*)',
hash_key= 'Codificación del RUT/Masked Identifier (RUT)',
hash_rut_completo= 'HASH alternativo, en el escenario en que se asuma que el individuo al que se le codificó el RUT presente mayor edad/Alternative HASH-Key',
id= 'Codigo Identificación de SENDA/SENDAs ID',
id_mod= 'ID de SENDA para Presentación en Página Web (enmascara caracteres 5 y 6)/SENDAs ID (mask characters 5 & 6)',
fech_ing= 'Fecha de Ingreso a Tratamiento (Primera Entrada)/Date of Admission to Treatment (First Entry)',
fech_egres_imp= 'Fecha de Egreso (Imputados KNN & Lógico) del Último Registro(b)/Date of Discharge (Imputed KNN & Logic) of the Last Entry(b)',
tipo_de_plan_2= 'Tipo de Plan del Último Registro/Type of Plan of the Last Entry',
tipo_de_plan_2_largest_treat= 'Tipo de Plan del Registro Más Largo entre entradas intermedias(f)/Type of Plan of the Largest Entry Among Intermediate Entries(f)',
tipo_de_plan_2_concat_a= 'Tipo de Plan(*)/Type of Plan(*)', 
tipo_de_programa_2= 'Tipo de Programa del Registro Más Largo entre Entradas Intermedias/Type of Program of the Largest Entry Among Intermediate Entries',
id_centro= 'ID de Centro(b)/Center ID(b)',
nombre_centro= 'Nombre del Centro de Tratamiento(*)/Treatment Center(*)',
id_centro_concat_a= 'ID de Centro(*)/Center ID(*)',
tipo_centro= 'Tipo de Centro del Último Registro/Type of Center of the Last Entry',
servicio_de_salud= 'Servicio de Salud(*)/Health Service(*)',
senda= 'SENDA del Último Registro/SENDA of the Last Entry',
numero_de_hijos_mod= 'Número de Hijos (Valor Max.)/Number of Children (Max. Value)',
num_hijos_trat_res_mod= 'Número de Hijos para Ingreso a Tratamiento Residencial del Último Registro/Number of Children to Residential Treatment of the Last Entry',
tipo_centro_derivacion= 'Tipo de Centro al que el Usuario es Derivado del Último Registro(b)/Type of Center of Derivation of the Last Entry(b)',
motivodeegreso_mod_imp= 'Motivo de Egreso (con abandono temprano y tardío)(Imputados KNN & Lógico) del Último Registro(b)/Cause of Discharge (with late and early withdrawal)(Imputed KNN & Logic) of the Last Entry(b)',
macrozona= "Macrozona del Centro del Último Registro(b)/Macrozones of the Center of the Last Entry(b)",
nombre_region= "Región del Centro del Último Registro(b)/Chilean Region of the Center of the Last Entry(b)",
comuna_residencia_cod= "Comuna de Residencia del Último Registro(b)/Municipality or District of Residence of the Last Entry(b)",
fecha_ingreso_a_convenio_senda= 'Fecha de Ingreso a Convenio SENDA (aún no formateada como fecha) (Primera Entrada)/Date of Admission to SENDA Agreement (First Entry)',
identidad_de_genero= 'Identidad de Género (Último Registro)(b)/Gender Identity (Last Entry)(b)',
edad_al_ing= 'Edad a la Fecha de Ingreso a Tratamiento (numérico continuo) (Primera Entrada)/Age at Admission to Treatment (First Entry)',
origen_ingreso_mod= 'Origen de Ingreso (Primera Entrada)/Motive of Admission to Treatment (First Entry)',
x_se_trata_mujer_emb= 'Mujer Embarazada al Ingreso (d)/Pregnant at Admission (d)',
compromiso_biopsicosocial= 'Compromiso Biopsicosocial(d)/Biopsychosocial Involvement(d)',
dg_global_nec_int_soc_or= 'Diagnóstico Global de Necesidades de Integración Social (Al Ingreso)(d)/Global Diagnosis of Social Integration (At Admission)(d)',
dg_nec_int_soc_cap_hum_or= 'Diagnóstico de Necesidades de Integración Social en Capital Humano (Al Ingreso)(d)/Global Diagnosis of Social Integration in Human Capital (At Admission)(d)',
dg_nec_int_soc_cap_fis_or= 'Diagnóstico de Necesidades de Integración Social en Capital Físico (Al Ingreso)(d)/Global Diagnosis of Social Integration in Physical Capital (At Admission)(d)',
dg_nec_int_soc_cap_soc_or= 'Diagnóstico de Necesidades de Integración Social en Capital Social (Al Ingreso)(d)/Global Diagnosis of Social Integration in Social Capital (At Admission)(d)',
usuario_tribunal_trat_droga= 'Usuario de modalidad Tribunales de Tratamiento de Drogas(d)/User of Drug Treatment Courts Modality(d)',
evaluacindelprocesoteraputico= 'Evaluación del Proceso Terapéutico(d)/Evaluation of the Therapeutic Process(d)',
eva_consumo= 'Evaluación al Egreso Respecto al Patrón de consumo(d)/Evaluation at Discharge regarding to Consumption Pattern(d)',
eva_fam= 'Evaluación al Egreso Respecto a Situación Familiar(d)/Evaluation at Discharge regarding to Family Situation(d)',
eva_relinterp= 'Evaluación al Egreso Respecto a Relaciones Interpersonales(d)/Evaluation at Discharge regarding to Interpersonal Relations(d)',
eva_ocupacion= 'Evaluación al Egreso Respecto a Situación Ocupacional(d)/Evaluation at Discharge regarding to Occupational Status(d)',
eva_sm= 'Evaluación al Egreso Respecto a Salud Mental(d)/Evaluation at Discharge regarding to Mental Health(d)',
eva_fisica= 'Evaluación al Egreso Respecto a Salud Física(d)/Evaluation at Discharge regarding to Physical Health(d)',
eva_transgnorma= 'Evaluación al Egreso Respecto a Trasgresión a la Norma Social(d)/Evaluation at Discharge regarding to Transgression to the Norm(d)',
dg_global_nec_int_soc_or_1= 'Diagnóstico Global de Necesidades de Integración Social (Al Egreso)(d)/Global Diagnosis of Social Integration (At Discharge)(d)',
dg_nec_int_soc_cap_hum_or_1= 'Diagnóstico de Necesidades de Integración Social en Capital Humano (Al Egreso)(d)/Global Diagnosis of Social Integration in Human Capital (At Discharge)(d)',
dg_nec_int_soc_cap_fis_or_1= 'Diagnóstico de Necesidades de Integración Social en Capital Físico (Al Egreso)(d)/Global Diagnosis of Social Integration in Physical Capital (At Discharge)(d)',
dg_nec_int_soc_cap_soc_or_1= 'Diagnóstico de Necesidades de Integración Social en Capital Social (Al Egreso)(d)/Global Diagnosis of Social Integration in Social Capital (At Discharge)(d)',
tiene_menores_de_edad_a_cargo= 'Menores de Edad A Cargo(d)/Minor Dependants(d)',
ha_estado_embarazada_egreso= '¿Ha estado embarazada? (al Egreso)(d)/Have you been Pregnant (at Discharge)(d)',
discapacidad= 'Presenta Discapacidad(d)/Disability(d)',
opcion_discapacidad= 'Origen de Discapacidad(d)/Cause of Disability(d)',
escolaridad= 'Escolaridad: Nivel Eduacional(d)/Educational Attainment(d)',
escolaridad_rec= 'Escolaridad: Nivel Eduacional(d) Normalizado a Progresión de Tratamientos/Educational Attainment(d) & Normalized Following Progression of Treatments',
edad_al_ing_grupos= 'Edad a la Fecha de Ingreso a Tratamiento en Grupos(c)/Age at Admission to Treatment In Groups(c)',
nacionalidad= 'Nacionalidad/Nationality',
sexo_2= 'Sexo Usuario/Sex of User',
embarazo= 'Embarazo al Ingreso(c)/Pregnant at Admission(c)',
fech_nac= 'Fecha de Nacimiento/Date of Birth',
edad_ini_cons= 'Edad de Inicio de Consumo/Age of Onset of Drug Use',
edad_ini_sus_prin=  'Edad de Inicio de Consumo Sustancia Principal/Age of Onset of Drug Use of Primary Substance',
edad_ini_sus_prin_grupos=  'Edad de Inicio de Consumo Sustancia Principal (en Grupos)/Age of Onset of Drug Use of Primary Substance (in Groups)',
estado_conyugal_2= 'Estado Conyugal/Marital Status',
edad_grupos= 'Edad agrupada/Age in groups',
freq_cons_sus_prin= 'Frecuencia de Consumo de la Sustancia Principal (30 días previos a la admisión)(f)/Frequency of Consumption of the Primary or Main Substance (30 days previous to admission)(f)',
via_adm_sus_prin_act= 'Vía de Administración de la Sustancia Principal (Se aplicaron criterios de limpieza)(f)/Route of Administration of the Primary or Main Substance (Tidy)(f)',
etnia_cor= 'Etnia/Ethnic Group',
nacionalidad_2= 'Segunda Nacionalidad/Second Nationality',
etnia_cor_2= 'Etnia (2)/Second Ethnic Group',
sus_ini_2_mod= 'Segunda Sustancia de Inicio(Sólo más frecuentes)/Second Starting Substance',
sus_ini_3_mod= 'Tercera Sustancia de Inicio(Sólo más frecuentes)/Third Starting Substance',
sus_ini_mod= "Sustancia de Inicio (Sólo más frecuentes)/Starting Substance (Only more frequent)",
con_quien_vive= 'Persona con la que vive el Usuario(f)/People that Share Household with the User (Cohabitation Status)(f)',
con_quien_vive_rec= 'Persona con la que vive el Usuario (Recodificada)(f)/People that Share Household with the User (Cohabitation Status)(Recoded)(f)',
estatus_ocupacional= 'Condición Ocupacional(f)/Occupational Status(f)',
cat_ocupacional= 'Categoría Ocupacional(f)/Occupational Category(f)',
sus_principal_mod= 'Sustancia Principal de Consumo (Sólo más frecuentes)(f)/Primary or Main Substance of Consumption at Admission (Only more frequent)(f)',
tipo_de_vivienda_mod= 'Tipo de Vivienda(f)/Type of Housing(f)', 
tenencia_de_la_vivienda_mod= 'Tenencia de la Vivienda(f)/Tenure status of Households(f)',
rubro_trabaja_mod= 'Rubro de Trabajo(f)/Area of Work(f)',
otras_sus1_mod= 'Otras Sustancias (1)(Sólo más frecuentes)(f)/Other Substances (1)(Only more frequent)(f)',
otras_sus2_mod= 'Otras Sustancias (2)(Sólo más frecuentes)(f)/Other Substances (2)(Only more frequent)(f)',
otras_sus3_mod= 'Otras Sustancias (3)(Sólo más frecuentes)(f)/Other Substances (3)(Only more frequent)(f)',
dg_trs_cons_sus_or= 'Diagnósico de Trastorno por Consumo de Sustancias(d)/Diagnosed of Substance Use Disorder(d)',
dg_trs_psiq_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria(g)',
dg_trs_psiq_sub_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV (Subclasificacion)(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria (sub-classification)(g)',
x2_dg_trs_psiq_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV (2)(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria (2)(g)',
x2_dg_trs_psiq_sub_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV (Subclasificacion) (2)(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria (sub-classification) (2)(g)',
x3_dg_trs_psiq_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV (3)(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria (3)(g)',
x3_dg_trs_psiq_sub_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV (Subclasificacion) (3)(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria (sub-classification) (3)(g)',
dg_trs_psiq_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria(g)',
dg_trs_psiq_sub_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (Subclasificacion)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (subclassification)(g)',
x2_dg_trs_psiq_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (2)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (2)(g)',
x2_dg_trs_psiq_sub_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (Subclasificacion) (2)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (subclassification) (2)(g)',
x3_dg_trs_psiq_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (3)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (3)(g)',
x3_dg_trs_psiq_sub_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (Subclasificacion) (3)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (subclassification) (3)(g)',
diagnostico_trs_fisico= 'Diagnóstico de Trastorno Físico(g)/Diagnosis of Physical Disorder(g)',
otros_probl_at_sm_or= 'Otros Problemas de Atención Vinculados a Salud Mental(g)/Other problems linked to Mental Health(g)',
x4_dg_trs_psiq_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV (4)(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria (4)(g)',
x4_dg_trs_psiq_sub_dsm_iv_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios DSM IV (Subclasificacion)(4)(g)/Diagnosis of Psychiatric Disorders, DSM-IV criteria (sub-classification)(4)(g)',
x4_dg_trs_psiq_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (4)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (4)(g)',
x5_dg_trs_psiq_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (5)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (5)(g)',
x4_dg_trs_psiq_sub_cie_10_or= 'Diagnóstico de Trastorno Psiquiátrico, Criterios CIE-10 (Subclasificacion)(4)(g)/Diagnosis of Psychiatric Disorders, CIE-10 criteria (subclassification)(4)(g)',
ano_bd_first= 'Año de la Base de Datos(c)/Year of the Dataset (Source)(c)',
ano_bd_last= 'Año de la Base de Datos(b)/Year of the Dataset (Source)(b)',
obs= 'Observaciones al Proceso de Limpieza y Estandarización de Casos(e)/Observations to the Process of Data Tidying & Standardization(e)',
obs_concat_a= 'Observaciones al Proceso de Limpieza y Estandarización de Casos(*)/Observations to the Process of Data Tidying & Standardization(*)',
rn_common_treats2= 'Cuenta de Entradas Comunes(b)/Count of Common Entries(b)',
concat_hash_id_treatments='Combination of User & Distint Entries',
at_least_one_cont_entry= "Casos de Usuarios con más de una entrada después de otra/Cases of users with more than one entry after another one",
senda_concat_a= 'SENDA(*)/SENDA(*)',
tipo_centro_concat_a= 'Tipo de Centro(*)/Type of Center(*)',
fech_ing_num= 'Fecha de Ingreso a Tratamiento (Numérico)(c)/Date of Admission to Treatment (Numeric)(c)',
fech_egres_num= 'Fecha de Egreso (Imputados KNN & Lógico)(Numérico)(b)/Date of Discharge (Imputed KNN & Logic)(Numeric)(b) of the Next Treatment',
fech_ing_next_treat= 'Fecha de Ingreso a Tratamiento (Numérico)(c) del Tratamiento Posterior/Date of Admission to Treatment (Numeric)(c)',
diff_bet_treat= 'Días de diferencia con el Tratamiento Posterior/Days of difference between the Next Treatment',
id_centro_sig_trat= "ID del Centro del Tratamiento Posterior/Center ID of the Next Treatment",
tipo_plan_sig_trat= "Tipo de Plan del Tratamiento Posterior/Type of Plan of the Next Treatment",
tipo_programa_sig_trat= "Tipo de Programa del Tratamiento Posterior/Type of Program of the Next Treatment", 
senda_sig_trat= "SENDA del Tratamiento Posterior/SENDA of the Next Treatment",
menor_60_dias_diff= 'Menor a 60 días de diferencia con el Tratamiento Posterior/Menor a 60 days of difference between the Next Treatment',
menor_45_dias_diff= 'Menor a 45 días de diferencia con el Tratamiento Posterior/Less than 45 days of difference between the Next Treatment',
motivoegreso_derivacion= "Motivo de Egreso= Derivación(b)/Cause of Discharge= Derivación(b)",
dias_treat_imp_sin_na= 'Días de Tratamiento (valores perdidos en la fecha de egreso se reemplazaron por la diferencia con 2019-11-13)/Days of Treatment (missing dates of discharge were replaced with difference from 2019-11-13)',
obs_cambios= "Cambios del tratamiento en comparación al Tratamiento Posterior/Changes in treatment compared to the Next Treatment",
obs_cambios_ninguno= "Sin cambios del tratamiento en comparación al Tratamiento Posterior/No changes in treatment compared to the Next Treatment",
obs_cambios_num= "Recuento de cambios del tratamiento en comparación al Tratamiento Posterior/Count of changes in treatment compared to the Next Treatment",
obs_cambios_fac= "Recuento de cambios del tratamiento en comparación al Tratamiento Posterior(factor)/Count of changes in treatment compared to the Next Treatment(factor)",
hash_key_sex_program= 'Usuarios a los que se le ha cambiado el sexo de acuerdo al tipo de plan/Users that changed of sex considering the types of plan',
centro_muj= 'ID de centro que alude a un centro específico para mujeres/Center ID aludes to a women-specific center',
dsm_iv= 'Diagnóstico DSM-IV (1 o más)/Psychiatric Diagnoses (DSM-IV)(one or more)',
cie_10= 'Diagnóstico CIE-10 (1 o más)/Psychiatric Diagnoses (ICD-10)(one or more)',
abandono_temprano= 'Abandono temprano(<3 meses)/ Early Drop-out(<3 months)',
cnt_mod_dsm_iv_or= 'Recuento de Diagnóstico DSM-IV/Count of Psychiatric Diagnoses (DSM-IV)',
cnt_mod_cie_10_or= 'Recuento de Diagnóstico CIE-10/Count of Psychiatric Diagnoses (ICD-10)',
cnt_diagnostico_trs_fisico= 'Recuento de Diagnóstico de Trastorno Físico/Count of Physical Disorder',
cnt_otros_probl_at_sm_or= 'Recuento de Otros Problemas de Atención Vinculados a Salud Mental/Count of Other problems linked to Mental Health',
cum_dias_trat_sin_na= 'Suma acumulada de Días de Tratamiento por Usuario/Cumulative Days of Treatment by User',
mean_cum_dias_trat_sin_na= 'Promedio acumulado de Días de Tratamiento por Usuario/Cumulative Average Days of Treatment by User',
cum_diff_bet_treat= 'Suma acumulada de Diferencia en Días con Tratamiento Siguiente por Usuario/Cumulative sum of Days of difference between the Next Treatment by User',
mean_cum_diff_bet_treat= 'Promedio acumulado de Diferencia en Días entre Tratamientos por Usuario/Cumulative Average Days of Differences Between Treatments By User',
rn_hash= 'Número de Tratamientos por Usuario (menor, tratamiento más antiguo)/Number of Treatments by User (less, older treatment)',
n_hash= 'Número total de Tratamientos por Usuario/Total Number of Treatments by User'  
)

#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_#_

no_mostrar_nunca=0
if(no_mostrar_nunca==1){
df_def<-
data.frame(cbind(var_name= names(codebook_data),var_def=data.table(codebook::var_label(codebook_data), keep.rownames = T), type=data.table(sapply(codebook_data, class)),can_be_na=data.table(rep(FALSE,length(names(codebook_data))))))%>%
  dplyr::rename("var_def"="var_def.V1","type"="type.V1","can_be_na"="can_be_na.V1")
}    

#####to_export_labels
table_labels<-
  tibble::rownames_to_column(data.frame(Hmisc::label(CONS_C1_df_dup_JUL_2020)))%>% data.frame() %>%
  dplyr::rename("code" = !!names(.[1]), "label" = !!names(.[2]))%>% data.frame()%>%
  dplyr::mutate(first= "cap label variable")%>%
  dplyr::mutate(final= paste0(first, " ",code,' "',label,'"'))%>%
  dplyr::select(-code,-label,-first)%>%
  rbind('cap save "G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/CONS_C1_df_dup_JUL_2020.dta", replace')%>%
  rbind('cap drop id id_mod nombre_centro consentimiento_informado')%>%
  rbind('cap save "G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/CONS_C1_df_dup_JUL_2020_exp.dta", replace')

table_labels<-
  data.frame(final='use "G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/CONS_C1_df_dup_JUL_2020.dta", clear')%>%
  rbind(table_labels)%>%
  rename("*final"="final") 
  #write.csv2(table_labels,"__labels_to_stata_C1_jun_2020.do",row.names =F)
  write.table(table_labels, file = "G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/SUD_CL/_label_var_to_stata.do", sep = "",
            row.names = FALSE, quote = FALSE,fileEncoding="UTF-8")

 # dplyr::filter(label!="") %>%
#  bind_rows(data.frame("code"=c("fech_egres_imp", 
##                                "dias_trat_imp",
#                                "dias_trat_alta_temprana_imp",
#                                "motivodeegreso_mod_imp"), "label"=c("Date of Discharge (Imputed)", 
##                                                                     "Days of Treatment (Imputed)",
#                                                                     "Days of Treatment for Early Withdrawal (Imputed)",
#                                                                     "Cause of Discharge w/ Early or Late Withdrawal (Imputed)"))) %>% 

CONS_C1_df_dup_JUL_2020%>%
  dplyr::arrange(hash_key, desc(fech_ing))%>% 
  rio::export(file = "G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/CONS_C1_df_dup_JUL_2020.dta")

save.image("G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/7.RData")

rm(list=setdiff(ls(), c("CONS_C1_df_dup_JUL_2020","CONS_C1_df_dup_JUN_2020","CONS_C1_df","CONS_C1","CONS_C1_df_dup_FEB_2020","CONS_TOP_df_dup_ENE_2020_prev9","CONS_TOP_df")))

save.image("G:/Mi unidad/Alvacast/SISTRAT 2019 (github)/8.RData")

#Agregar nuevas variables al codebook
#hacer codebook
#Agregar procesos a strobe
#Poner olr y arriba pondría los otros modelos.


cap do _label_var_to_stata.do



sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Chile.1252  LC_CTYPE=Spanish_Chile.1252   
[3] LC_MONETARY=Spanish_Chile.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Chile.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_1.5            gridExtra_2.3           radiant.update_1.4.1   
 [4] forcats_0.5.0           purrr_0.3.4             readr_1.3.1            
 [7] tibble_3.0.1            tidyverse_1.3.0         treemapify_2.5.3       
[10] ggiraph_0.7.0           chilemapas_0.2          sf_0.9-3               
[13] finalfit_1.0.1          lsmeans_2.30-0          emmeans_1.4.7          
[16] choroplethrAdmin1_1.1.1 choroplethrMaps_1.0.1   choroplethr_3.6.3      
[19] acs_2.1.4               XML_3.99-0.3            RColorBrewer_1.1-2     
[22] panelr_0.7.3            lme4_1.1-23             Matrix_1.2-18          
[25] dplyr_1.0.0             data.table_1.12.8       codebook_0.9.2         
[28] Statamarkdown_0.4.5     devtools_2.3.0          usethis_1.6.1          
[31] sqldf_0.4-11            RSQLite_2.2.0           gsubfn_0.7             
[34] proto_1.0.0             broom_0.7.0             zoo_1.8-8              
[37] altair_4.0.1            rbokeh_0.5.1            janitor_2.0.1          
[40] plotly_4.9.2.1          kableExtra_1.1.0        Hmisc_4.4-0            
[43] Formula_1.2-3           survival_3.1-12         lattice_0.20-41        
[46] ggplot2_3.3.1           stringr_1.4.0           stringi_1.4.6          
[49] tidyr_1.1.0             knitr_1.29              matrixStats_0.56.0     
[52] boot_1.3-25            

loaded via a namespace (and not attached):
  [1] estimability_1.3        rappdirs_0.3.1          coda_0.19-3            
  [4] acepack_1.4.1           bit64_0.9-7             multcomp_1.4-13        
  [7] rpart_4.1-15            generics_0.0.2          callr_3.4.3            
 [10] TH.data_1.0-10          mice_3.9.0              ggfittext_0.9.0        
 [13] DiagrammeR_1.0.6.1.9000 chron_2.3-55            bit_1.1-15.2           
 [16] webshot_0.5.2           xml2_1.3.2              lubridate_1.7.9        
 [19] httpuv_1.5.4            assertthat_0.2.1        xfun_0.16              
 [22] hms_0.5.3               data.tree_0.7.11        evaluate_0.14          
 [25] promises_1.1.1          fansi_0.4.1             dbplyr_1.4.4           
 [28] readxl_1.3.1            randomizr_0.20.0        DBI_1.1.0              
 [31] tmvnsim_1.0-2           htmlwidgets_1.5.1       jsonvalidate_1.1.0     
 [34] ellipsis_0.3.1          import_1.1.0            crosstalk_1.1.0.1      
 [37] backports_1.1.8         V8_3.1.0                insight_0.8.4          
 [40] markdown_1.1            vctrs_0.3.1             remotes_2.1.1          
 [43] sjlabelled_1.1.5        abind_1.4-5             withr_2.2.0            
 [46] pryr_0.1.4              tigris_0.9.4            rgdal_1.5-8            
 [49] checkmate_2.0.0         ggmap_3.0.0             prettyunits_1.1.1      
 [52] mnormt_2.0.0            cluster_2.1.0           lazyeval_0.2.2         
 [55] crayon_1.3.4            crul_0.9.0              labeling_0.3           
 [58] units_0.6-6             pkgconfig_2.0.3         nlme_3.1-148           
 [61] pkgload_1.1.0           nnet_7.3-14             rlang_0.4.6            
 [64] RJSONIO_1.3-1.4         lifecycle_0.2.0         sandwich_2.5-1         
 [67] httpcode_0.3.0          modelr_0.1.8            cellranger_1.1.0       
 [70] tcltk_4.0.2             rprojroot_1.3-2         shinyFiles_0.8.0.9003  
 [73] carData_3.0-4           reprex_0.3.0            base64enc_0.1-3        
 [76] processx_3.4.2          png_0.1-7               viridisLite_0.3.0      
 [79] rjson_0.2.20            parameters_0.7.0        bitops_1.0-6           
 [82] KernSmooth_2.23-17      visNetwork_2.0.9        pander_0.6.3           
 [85] blob_1.2.1              classInt_0.4-3          maptools_1.0-1         
 [88] jpeg_0.1-8.1            shinyAce_0.4.1          ggeffects_0.14.3       
 [91] scales_1.1.1            memoise_1.1.0           plyr_1.8.6             
 [94] hexbin_1.28.1           compiler_4.0.2          snakecase_0.11.0       
 [97] cli_2.0.2               patchwork_1.0.1         ps_1.3.3               
[100] htmlTable_2.0.1         MASS_7.3-51.6           tidyselect_1.1.0       
[103] highr_0.8               jtools_2.0.5            yaml_2.2.1             
[106] radiant.model_1.3.12    latticeExtra_0.6-29     ggrepel_0.8.2          
[109] grid_4.0.2              rmapshaper_0.4.4        tools_4.0.2            
[112] parallel_4.0.2          rio_0.5.16              RgoogleMaps_1.4.5.3    
[115] rstudioapi_0.11         uuid_0.1-4              foreign_0.8-80         
[118] NeuralNetTools_1.5.2    pdp_0.7.0               gistr_0.5.0            
[121] farver_2.0.3            sjPlot_2.8.4            digest_0.6.25          
[124] shiny_1.5.0             geojsonlint_0.4.0       Rcpp_1.0.4.6           
[127] car_3.0-8               performance_0.4.6       later_1.1.0.1          
[130] writexl_1.3             httr_1.4.2              gdtools_0.2.2          
[133] WDI_2.6.0               psych_2.0.7             effectsize_0.3.1       
[136] sjstats_0.18.0          colorspace_1.4-1        rvest_0.3.5            
[139] fs_1.4.1                radiant.data_1.3.9      ranger_0.12.1          
[142] reticulate_1.16         splines_4.0.2           statmod_1.4.34         
[145] sp_1.4-2                vegawidget_0.3.1        xgboost_1.1.1.1        
[148] sessioninfo_1.1.1       systemfonts_0.2.3       xtable_1.8-4           
[151] jsonlite_1.6.1          nloptr_1.2.2.1          testthat_2.3.2         
[154] R6_2.4.1                pillar_1.4.6            htmltools_0.5.0        
[157] mime_0.9                glue_1.4.1              fastmap_1.0.1          
[160] minqa_1.2.4             class_7.3-17            codetools_0.2-16       
[163] maps_3.3.0              pkgbuild_1.1.0          mvtnorm_1.1-1          
[166] curl_4.3                zip_2.0.4               openxlsx_4.1.5         
[169] rmarkdown_2.3           repr_1.1.0              desc_1.2.0             
[172] munsell_0.5.0           e1071_1.7-3             labelled_2.5.0         
[175] sjmisc_2.8.5            haven_2.3.1             reshape2_1.4.4         
[178] gtable_0.3.0            bayestestR_0.6.0